Series: Agentic AI in Python — Zero to Production This is Part 4 — AI agent memory. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tools and a SQLite checkpointer → /build-agentic-ai-app-python-part-1/ — Part 2: Wrapped it in FastAPI, Dockerised it, and deployed it → /build-agentic-ai-app-python-part-2/ — Part 3: Scaled to a supervisor + workers multi-agent team → /build-agentic-ai-app-python-part-3/
If you’re starting here, you only need Part 1’s
agent.py— everything in this post is a small diff on top of it.
AI agent memory is where the agent you built in Part 1 quietly fails. Try this test: open a chat and tell it “I’m Sukhveer, I deploy on Fly.io, and I prefer answers with code.” Have a great conversation. Now start a new thread and ask “where do I deploy?” — total blank. The checkpointer didn’t break; it was never designed for this.
What’s missing is long-term memory — the kind that follows a user across conversations instead of dying with the thread. It’s also the single most requested topic after Part 3, which is why this part exists.
By the end of this post your agent will remember facts about users across threads, retrieve them by meaning rather than exact keywords, and keep all of it in a Redis store that survives restarts and redeploys. Three steps, each a few lines. First, let’s be precise about why the agent forgets.
Why Your Agent Still Forgets
The confusion starts because “memory” means two different things, and most tutorials only give you one. A checkpointer (the SQLite saver from Part 1) records the message history of a single thread — kill the thread, and everything it learned about the user is unreachable. Checkpointers give your threads memory. They don’t give your agent memory.
The diagram shows the two lanes side by side. The top lane is what you already have: short-term, thread-scoped state. The bottom lane is what we’re adding — a LangGraph Store, a key-value memory with optional vector search that every thread can read and write. LangGraph treats these as separate components on purpose: you compile your graph with both a checkpointer= and a store=, and they never touch each other’s data.
When I finally understood this split, my reaction was mild annoyance — I’d spent a weekend trying to make checkpointers do cross-thread recall by reusing thread IDs. Don’t do that. It bloats one thread’s history until every call drags the entire past through the model, and my token bill noticed before I did.
Which AI Agent Memory Type Do You Need?
Before writing code, decide what the agent actually needs to remember — because each type has a different cheapest-correct implementation. Run through this checklist:
- “Remember this conversation” → thread memory. You already have it: the Part 1 checkpointer. Stop here if that’s all you need.
- “Remember facts about me across chats” (name, stack, preferences) → cross-thread key-value memory. Step 1 below.
- “Recall relevant past context, even when worded differently” → semantic memory (vector search over stored facts). Step 2.
- “Remember after a restart, in production” → a durable store backend. Step 3.
The flowchart is the whole build plan. Prerequisites are light: the Part 1 project, Python 3.11+, and pip install -U langgraph langgraph-checkpoint-redis (LangGraph 1.2.4 and langgraph-checkpoint-redis 0.4.1 were current when I wrote this in June 2026). For Step 2 you’ll also want an embedding model — more on that there. If you don’t have the Part 1 code, build it first; it’s a 30-minute read.
Step 1 — Add a Cross-Thread Memory Store
A Store in LangGraph is deliberately boring: namespaces, keys, and JSON-ish values. The namespace is a tuple, and putting the user ID in the namespace is what makes memory multi-user safe — every user gets their own shelf.
# memory_demo.py — the Store API in 10 linesfrom langgraph.store.memory import InMemoryStorestore = InMemoryStore()ns = ("memories", "user-sukhveer") # one shelf per userstore.put(ns, "deploy-target", {"text": "Deploys agents on Fly.io"})store.put(ns, "style", {"text": "Prefers answers with code"})item = store.get(ns, "deploy-target")print(item.value["text"]) # Deploys agents on Fly.io
That’s the entire mental model. Now wire it into the agent. Compile the graph with the store, and any node can ask for it with an injected store argument:
# agent.py — additions to Part 1's graphfrom langgraph.store.base import BaseStorefrom langgraph.store.memory import InMemoryStoredef recall(state, config, *, store: BaseStore):user_id = config["configurable"]["user_id"]items = store.search(("memories", user_id), limit=5)facts = "\n".join(i.value["text"] for i in items)system = f"Known facts about this user:\n{facts or 'None yet.'}"return {"messages": [SystemMessage(content=system)]}store = InMemoryStore()app = graph.compile(checkpointer=memory, store=store) # both layers
The recall node runs first and prepends whatever the store knows to the conversation. Two threads, two different thread_ids — same facts. Your agent just stopped being a goldfish, at least until the process exits. InMemoryStore is a Python dict underneath; it exists so you can get the wiring right before paying for infrastructure.
Step 2 — Make Memory Searchable by Meaning
Key-value lookup breaks the moment memory grows past a handful of entries. Yesterday’s fact was stored as “deploys on Fly.io”; today’s question is “what cloud do I use?” — no keyword overlap, no match. Semantic search (comparing meaning via embeddings — numeric vectors that place similar sentences close together) fixes exactly this.
LangGraph’s Store has it built in. Pass an index config and store.search() grows a query parameter:
# semantic memory — one config changefrom langchain.embeddings import init_embeddingsstore = InMemoryStore(index={"embed": init_embeddings("openai:text-embedding-3-small"),"dims": 1536,})# later, inside recall():items = store.search(("memories", user_id),query="what cloud does this user deploy to?",limit=3)
One honest wrinkle: this series runs on Claude, and Anthropic doesn’t ship an embeddings API — so the index needs a second provider. I use OpenAI’s text-embedding-3-small because it’s cheap (I’ve never crossed a dollar a month on a side project); a local HuggingFace model works if you’d rather not add another key. In my tests the semantic lookup adds roughly 100–150ms per recall — noticeable in logs, invisible in chat.
Common mistake: dumping entire conversations into the store “to be safe.” Search quality collapses, because every query matches a wall of chat noise. Store distilled facts, one per entry — “prefers code answers”, not 40 raw messages. The store remembers; it shouldn’t transcribe.
Step 3 — Production Memory With Redis
InMemoryStore evaporates on every deploy — fine on a laptop, useless behind the FastAPI service from Part 2. Production AI agent memory has one requirement the demos skip: it must outlive the process. The drop-in fix is the Redis store, which implements the same BaseStore interface, so the recall node doesn’t change at all:
# production: same API, durable backendfrom langgraph.store.redis import RedisStorewith RedisStore.from_conn_string("redis://localhost:6379") as store:store.setup() # creates indices — run onceapp = graph.compile(checkpointer=memory, store=store)# serve FastAPI inside this context
Two things the README undersells. First, RedisStore supports the same vector-index config as Step 2, so semantic search comes along for free — you don’t need a separate vector database for memory at this scale. Second, Redis TTLs (time-to-live — automatic key expiry) give you memory forgetting almost for free, and forgetting matters: user preferences from eight months ago are as likely to mislead the agent as help it. I set a 90-day refresh-on-read TTL and let stale facts quietly fall away.
A free 30MB Redis Cloud instance has been more than enough for my agents — distilled facts are tiny. If you self-host next to Part 2’s Docker setup, one redis:7-alpine service in your compose file does it.
Testing It + Common Errors
The test that matters is the one from the intro, now in three commands: state a fact in thread A, ask for it back in thread B, then restart the process and ask again in thread C (new thread_ids, same user_id). Pass all three and your AI agent memory is real; fail the third and your store isn’t durable.
Errors I actually hit while building this:
TypeError: recall() missing 1 required keyword-only argument: 'store'— you compiled withoutstore=. The injection only works when the graph knows about the store.- Semantic queries return nothing after switching embedding models — the index
dimsno longer match the stored vectors. Wipe and re-embed; vectors from different models aren’t comparable. RedisStoreworks locally, empty in production — two services pointing at different Redis URLs. Log the connection string hash at startup; it’s a 10-second check that saved me an evening.
What to Build Next
The ceiling on this design is what gets stored — right now, you decide in code. The natural upgrade is letting the model itself extract facts worth keeping after each exchange: an LLM call that reads the turn and writes zero or more distilled memories. That’s the idea behind LangChain’s LangMem library, and after building the manual version you’ll recognise exactly what it automates — same Store underneath.
Then point this at Part 3: give the multi-agent team a shared namespace for verified findings, and a fact one worker checks today is free for every worker tomorrow. I’d build the memory-extraction step first, though — shared memory amplifies whatever you store, including junk.
Conclusion
Your agent now has both memory layers: a checkpointer for the conversation it’s having, and a searchable, durable store for everything it should carry between conversations. The rules that matter: store distilled facts rather than transcripts, namespace by user, search by meaning, and let old memories expire.
This also closes the loop the series opened — the What Are AI Agents? guide called memory one of the four pillars of an agent, and it was the last one we hadn’t built properly.
So, a specific question: what’s the first fact you’d want your agent to remember about you — and what’s one you’d explicitly want it to forget? The answers shape what Part 5 covers (evaluation and observability are the current front-runners).
Catch up on the series: Part 1 — Tools, StateGraph & Memory · Part 2 — FastAPI, Docker & Deploy · Part 3 — Multi-Agent Systems






