By default, an AI agent forgets everything the moment a step ends. The model carries nothing between calls. Each run starts from a blank slate, no matter what happened a second ago.
AI agent memory is the architecture that fixes that. In Part 2 we treated the context window as RAM — small, fast, working memory. Memory is the storage tier behind it: what the agent writes down, recalls later, and forgets on purpose.
This is Part 3 of the Designing AI-Native Applications series. Part 2 was about the window the model reads. This post is about everything that isn’t in the window yet — and how it gets there at the right moment.
- Agents are stateless by default. Memory is the layer you build so they remember across steps and sessions.
- Three long-term types: episodic (what happened), semantic (facts and preferences), procedural (how-to). Working memory is just the window.
- Memory is a lifecycle, not a database — write, recall, consolidate, forget. The hard parts are recall and forgetting, not storage.
Why Agents Forget
The model itself has no memory. Give it the same question twice in two separate calls and it answers fresh both times, with no idea the first ever happened. Anything it “remembers” is something you put back in front of it.
That’s fine for a one-shot task. It breaks the moment an agent runs over many steps, uses tools whose results matter later, or talks to the same user tomorrow. Without a memory layer, it re-asks what it already knows and repeats mistakes it already made.
So memory sits outside the model, as storage. The window is RAM — small and wiped each step. Memory is the disk behind it. The architecture isn’t the database; it’s the paging — deciding what to write down, what to pull back into the window, and what to let go.
The Three Kinds of AI Agent Memory
“Memory” isn’t one thing. It splits by what it holds and how long it stays true. Working memory is the window from Part 2. The three long-term kinds are where the architecture lives:
| Type | What it holds | Example | Where it usually lives |
|---|---|---|---|
| Working | The current step | This conversation turn | Context window (RAM) |
| Episodic | What happened before | “Last run, the API timed out” | Event log / vector store |
| Semantic | Facts & preferences that stay true | “This user bills in EUR” | Vector DB / key-value store |
| Procedural | How-to and tool patterns | “Refunds need a manager’s OK” | Rules / prompt / skills file |
The split matters because each kind has a different shelf life and a different way to recall it. Episodic memory is searched by similarity (“have I seen something like this before?”). Semantic facts are looked up directly. Procedural rules often just live in the system prompt. Lumping them into one bucket is how you end up with a vector store full of stale chat logs that drowns out the one fact you needed.
In practice the three map to different backends: a vector database for episodic and semantic recall, a key-value or document store for stable preferences, and plain config or a skills file for procedural rules. They also need a scope — per user, per conversation, or shared — so one person’s memory never bleeds into another’s.
Memory Is Write, Recall, and Forget
Because memory is storage, you don’t just “have” it — you operate it. Every step runs a small loop:
# Each step: recall relevant memory, act, then write back.mem = recall(query, stores=["episodic", "semantic"], k=5) # read up into the windowctx = build_context(system_prompt, mem, user_input) # Part 2's pipelineresult = agent(ctx)write(result.facts, store="semantic") # save only what's worth keepingconsolidate() # summarize, merge duplicates, drop the stale
Read it as four jobs. Write: decide what’s worth saving — not every message, just facts and outcomes you’ll want later. Recall: pull the few relevant pieces into the window, not the whole store. Consolidate: summarize and merge so memory stays compact. Forget: drop what’s stale or wrong.
Frameworks differ mostly in how they automate these four. Mem0 leans on vector recall; Zep’s Graphiti uses a temporal knowledge graph that tracks how facts change over time; Letta/MemGPT gives the agent self-editing memory blocks it rewrites itself. Pick by your dominant need: fast semantic recall, relationships and time, or agent-curated state.
Recall, by the way, is where memory meets the last post: what you recall has to fit the window’s budget. Semantic search over your memory is the same retrieval you saw in What Is RAG and the embeddings primer — pointed at the agent’s own past instead of a document set.
Making Recall and Forgetting Work
Storing memory is easy. The three operations that decide whether it helps are recall, forgetting, and conflict handling — so engineer them on purpose, not by default.
- Recall: don’t just take the top matches by similarity. Rerank the candidates, weight recent memories higher, and run hybrid search (keywords and vectors) so an exact name or ID never gets missed. Choose how many to pull by the window’s budget, not by habit.
- Forgetting: give memories a time-to-live or a decay score, and evict by age and by how often they’re actually recalled. A memory nothing ever reads is pure cost and pure noise.
- Conflicts: when a new fact contradicts a stored one, don’t keep both and hope. Store facts with a timestamp and let the newest supersede the old, so a changed preference can’t resurface later.
Get these three right and most “bad memory” symptoms disappear — because they were never storage problems, they were recall and forgetting problems.
Where Memory Breaks
Like context, memory fails in ways that look like “the model is dumb” but aren’t:
- Stale or contradictory memory. A preference changed, but the old one is still stored, so the agent confidently uses the wrong fact.
- Memory bloat. You save everything, the store fills with near-duplicates, and recall can no longer surface the one useful item through the noise.
- Wrong recall. Similarity search returns something related but not relevant, and the agent acts on it.
- Privacy leaks. Personal data written to long-term memory can resurface in a later session — or for a different user — if memory isn’t scoped per user.
A support agent I tested once “remembered” a customer’s old shipping address from a months-old chat and used it for a new order. The fact was real — it just wasn’t true anymore, and nothing had told the memory layer to expire it. Stored once, trusted forever.
Every one of these is a write/forget problem, not a storage problem. Saving more is easy; saving the right things and pruning the rest is the actual engineering.
When You Don’t Need Memory
Memory is a cost, not a default. If a task is self-contained, give the agent no memory at all — it’s faster, cheaper, and can’t leak or contradict itself.
Add memory only when:
- The agent runs over many turns and must remember what already happened.
- It serves the same user again and should recall their preferences.
- It should improve with use — learning which tools and steps work.
If none of those hold, skip it. The same “simplest thing that works” rule from Part 1 applies: don’t store what you’ll never read.
Quick Recap
- Agents are stateless; memory is the storage tier you build behind the context window.
- Three long-term types: episodic (what happened), semantic (facts & preferences), procedural (how-to).
- The lifecycle: write, recall, consolidate, forget.
- It breaks on stale facts, bloat, wrong recall, and privacy leaks — all write/forget problems.
- Skip memory for self-contained tasks; add it only when something must persist.
Conclusion
AI agent memory is less about where you store data and more about what you choose to write, recall, and forget. Treat the window as RAM and memory as the storage behind it, split your long-term memory into episodic, semantic, and procedural, and put your real effort into recall and pruning. Do that and your agent stops repeating itself — without drowning in its own history.
What would you have your agent remember first — past runs, user preferences, or how a task is done? Tell me in the comments.
Read next: Part 4 of Designing AI-Native Applications — Agent Orchestration Patterns, on coordinating multiple agents without the whole thing turning into chaos (linked here once it’s published).
