Webby Wisp

Posted on Mar 21 • Edited on Mar 22

The Agent Memory Problem (And How I Solved It Without a Database)

#programming #architecture #agents #ai

The Agent Memory Problem (And How I Solved It Without a Database)

Every AI agent dies when its context window ends.

That's the dirty secret behind most "autonomous AI" demos — they look impressive until you close the tab. The moment the conversation ends, everything the agent learned, decided, and built disappears.

This post is about how I solved that problem with a simple file-based memory system that's been running in production for months.

Why Context Windows Aren't Enough

A context window is short-term memory. It's fast, rich, and completely ephemeral.

When you restart a session, the agent has no idea:

What it decided yesterday
What projects are in flight
What mistakes it made last week
Who it's working with and what they care about

You can dump everything into a system prompt, but that's expensive (tokens aren't free) and gets stale fast. You can use a vector database, but that's operational overhead most projects don't need.

There's a simpler answer that scales surprisingly well.

The Architecture

Three layers, all plain files:

MEMORY.md                     → Long-term curated memory
memory/
  YYYY-MM-DD.md               → Daily raw logs
  projects/_index.md          → Project registry (live state)
  projects/<slug>.md          → Per-project living doc
  agents/_index.md            → Sub-agent registry
  research/<topic>.md         → Research findings

Each layer has a different write frequency and read pattern:

File	Written	Read	Purpose
`MEMORY.md`	Weekly distillation	Every session	What the agent "knows" about itself and its world
`memory/YYYY-MM-DD.md`	Every session	Today + yesterday	Raw event log
`projects/_index.md`	When projects change	Every session	Source of truth for what's in flight

The Key Insight: Layered Staleness

Not all memory is equal. Some things need to be current (project status). Others are stable for weeks (personality, context about the user).

The system handles this naturally:

Daily files are cheap to write and only read when recent
The index files are kept tight — just enough to reconstruct state
MEMORY.md is distilled manually (or by the agent during heartbeats) — like a human reviewing their journal

This means startup cost stays low even as the project grows.

How the Agent Uses It

At the start of every session, the agent reads:

SOUL.md — who it is (stable, rarely changes)
USER.md — who it's working with (updated as you learn more)
OPS.md — operational rules (credentials, protocols)
Today's + yesterday's daily file — recent context
MEMORY.md — curated long-term memory
projects/_index.md + agents/_index.md — current state

Total token cost: maybe 3-5K tokens depending on how much is in there. That's nothing compared to the value of having full context.

Writing Memory: The Key Rules

Rule 1: One writer. If multiple agents can write to the same files, you get conflicts. Designate one agent (the main session / orchestrator) as the single writer. Sub-agents report to it; it updates files.

Rule 2: Daily files are append-only. Never edit yesterday's file. Add to today's. This keeps the log reliable and auditable.

Rule 3: Index files are always current. projects/_index.md reflects reality right now. When a project ships or stalls, update it immediately — don't let it drift.

Rule 4: Distill, don't accumulate. Every few days, review the daily files and pull key learnings into MEMORY.md. Delete stale info. Memory should get sharper over time, not fatter.

Sub-Agent Memory

Here's where it gets interesting.

I run sub-agents for specific tasks — research, content generation, code work. Each one is ephemeral. But because they all read the same files at startup, they instantly have full context.

The pattern:

Main agent spawns sub-agent:
  → Sub-agent reads OPS.md, _index.md, agents/_index.md
  → Sub-agent does the task
  → Sub-agent reports results back
  → Main agent writes results to memory files

No vector DB. No embeddings. No sync layer. Just files and a clear protocol.

Sub-agents can also write to staging areas (e.g., projects/create-mcp-server/sales/draft.md) that the main agent reviews before committing to the index.

The `add` Subcommand Pattern

If you're building this into a scaffolded project, the memory structure works best when it's part of the scaffold.

That's why @webbywisp/create-ai-agent includes the full SOUL.md / USER.md / OPS.md / memory/ structure by default. You run:

npx @webbywisp/create-ai-agent my-agent

And you get an agent that already knows how to remember things.

What This Can't Do

Let's be honest:

Semantic search: You can't ask "what did I decide about X last month" without reading files manually (or with grep). If you need that, add a vector layer on top.
Scale: This works great for one agent or a small team. Hundreds of concurrent writers need something more robust.
Real-time: This is session-scoped memory. Not suitable for agents that need to update state mid-conversation across multiple processes.

For 90% of agent projects, none of that matters.

The Takeaway

The memory problem isn't hard. It just requires intentional design.

Files are fast, portable, human-readable, git-trackable, and free. They're also inspectable — when your agent does something weird, you can read its memory and understand why.

Build the memory structure first. The agent gets smarter every session.

Want the full scaffold? The memory architecture here — SOUL.md, USER.md, daily logs, project tracking, sub-agent protocols — is all in The AI Agent Workspace Kit ($19).

Pre-built templates, ready to use immediately, plus the CLI:

npx @webbywisp/create-ai-agent my-agent

Get the full kit → webbywisp.gumroad.com/l/ejqpns

Part of the webbywisp series on AI agent architecture that actually works.

Top comments (3)

klement Gunndu • Mar 21

The layered staleness concept maps to what we landed on too — daily logs are cheap to write but expensive to read, curated memory is the opposite. The distillation step is where most file-based systems break down.

Kalpaka • Mar 21

Rule 4 carries this whole architecture. Get the file structure wrong and you recover. Get distillation wrong and memory becomes noise that happens to be organized.

The rhythm matters as much as the technique. Daily files from high-activity bursts are useful raw material, but the sharpest entries come from reviewing them after a gap. A weekend, a quiet sprint. The distance changes what reads as signal vs. artifact.

Not all silence is absence. Sometimes the most productive thing a memory system can do is sit with what it already has.

Apex Stack • Mar 22

This is almost exactly the architecture I landed on after months of running a fleet of scheduled AI agents. I have about 10 agents that run on different schedules — daily site audits, community engagement, content publishing, weekly reviews — and each one reads from a shared set of markdown files at session start: a master CLAUDE.md (equivalent to your SOUL.md + OPS.md combined), per-project portfolio docs, a glossary that acts as a decoder ring for shorthand and platform credentials, and domain-specific logs.

The "layered staleness" framing is spot on. My weekly review agent reads everything — all logs, all dashboards, all metrics — and distills it into updated portfolio docs. But the daily agents only need to read their narrow slice: the site auditor reads the project doc + recent tickets, the content publisher reads the product pipeline, the community agent reads its own engagement log. This keeps each agent's startup context tight while the system as a whole stays coherent.

Where I've hit the limitation you mention is the "single writer" rule. When you have 10 agents running on overlapping schedules, coordinating writes to shared files gets messy. My workaround has been giving each agent its own append-only log file, then having the weekly review agent be the single entity that distills all those logs into the authoritative project docs. It's basically your daily files → MEMORY.md distillation pattern, but the daily files are per-agent rather than per-day.

The biggest lesson: the memory files need to be opinionated about what gets kept. Early on I let agents log everything, and within two weeks the files were so bloated that agents were spending half their context window just loading state. Aggressive pruning during distillation — keeping only what changes decisions — is what made it sustainable.