HomeOur TeamContact
HomeArtificial Intelligence
Build an Agentic AI App in Python: AI Agent Memory (Part 4)

Build an Agentic AI App in Python: AI Agent Memory (Part 4)

Artificial Intelligence
June 12, 2026
6 min read
Intermediate
📚 Part of the series: Agentic AI in Python: Zero to Production
Robot with a glowing memory chip wired to a database — AI agent memory in Python, Part 4 of the agentic AI series
Table of Contents
01
Why Your Agent Still Forgets
02
Which AI Agent Memory Type Do You Need?
03
Step 1 — Add a Cross-Thread Memory Store
04
Step 2 — Make Memory Searchable by Meaning
05
Step 3 — Production Memory With Redis
06
Testing It + Common Errors
07
What to Build Next
08
Conclusion

Series: Agentic AI in Python — Zero to Production This is Part 4 — AI agent memory. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tools and a SQLite checkpointer → /build-agentic-ai-app-python-part-1/ — Part 2: Wrapped it in FastAPI, Dockerised it, and deployed it → /build-agentic-ai-app-python-part-2/ — Part 3: Scaled to a supervisor + workers multi-agent team → /build-agentic-ai-app-python-part-3/

If you’re starting here, you only need Part 1’s agent.py — everything in this post is a small diff on top of it.

AI agent memory is where the agent you built in Part 1 quietly fails. Try this test: open a chat and tell it “I’m Sukhveer, I deploy on Fly.io, and I prefer answers with code.” Have a great conversation. Now start a new thread and ask “where do I deploy?” — total blank. The checkpointer didn’t break; it was never designed for this.

What’s missing is long-term memory — the kind that follows a user across conversations instead of dying with the thread. It’s also the single most requested topic after Part 3, which is why this part exists.

By the end of this post your agent will remember facts about users across threads, retrieve them by meaning rather than exact keywords, and keep all of it in a Redis store that survives restarts and redeploys. Three steps, each a few lines. First, let’s be precise about why the agent forgets.

Why Your Agent Still Forgets

The confusion starts because “memory” means two different things, and most tutorials only give you one. A checkpointer (the SQLite saver from Part 1) records the message history of a single thread — kill the thread, and everything it learned about the user is unreachable. Checkpointers give your threads memory. They don’t give your agent memory.

AI agent memory architecture: a thread-scoped checkpointer keeps conversation state while a cross-thread LangGraph Store with embeddings and Redis gives the agent long-term memory

The diagram shows the two lanes side by side. The top lane is what you already have: short-term, thread-scoped state. The bottom lane is what we’re adding — a LangGraph Store, a key-value memory with optional vector search that every thread can read and write. LangGraph treats these as separate components on purpose: you compile your graph with both a checkpointer= and a store=, and they never touch each other’s data.

When I finally understood this split, my reaction was mild annoyance — I’d spent a weekend trying to make checkpointers do cross-thread recall by reusing thread IDs. Don’t do that. It bloats one thread’s history until every call drags the entire past through the model, and my token bill noticed before I did.

Which AI Agent Memory Type Do You Need?

Before writing code, decide what the agent actually needs to remember — because each type has a different cheapest-correct implementation. Run through this checklist:

  • “Remember this conversation” → thread memory. You already have it: the Part 1 checkpointer. Stop here if that’s all you need.
  • “Remember facts about me across chats” (name, stack, preferences) → cross-thread key-value memory. Step 1 below.
  • “Recall relevant past context, even when worded differently” → semantic memory (vector search over stored facts). Step 2.
  • “Remember after a restart, in production” → a durable store backend. Step 3.

Four-step flowchart for adding long-term memory to a Python AI agent: pick the memory type, add a cross-thread store, enable semantic search, then swap in Redis for production

The flowchart is the whole build plan. Prerequisites are light: the Part 1 project, Python 3.11+, and pip install -U langgraph langgraph-checkpoint-redis (LangGraph 1.2.4 and langgraph-checkpoint-redis 0.4.1 were current when I wrote this in June 2026). For Step 2 you’ll also want an embedding model — more on that there. If you don’t have the Part 1 code, build it first; it’s a 30-minute read.

Step 1 — Add a Cross-Thread Memory Store

A Store in LangGraph is deliberately boring: namespaces, keys, and JSON-ish values. The namespace is a tuple, and putting the user ID in the namespace is what makes memory multi-user safe — every user gets their own shelf.

python
# memory_demo.py — the Store API in 10 lines
from langgraph.store.memory import InMemoryStore
store = InMemoryStore()
ns = ("memories", "user-sukhveer") # one shelf per user
store.put(ns, "deploy-target", {"text": "Deploys agents on Fly.io"})
store.put(ns, "style", {"text": "Prefers answers with code"})
item = store.get(ns, "deploy-target")
print(item.value["text"]) # Deploys agents on Fly.io

That’s the entire mental model. Now wire it into the agent. Compile the graph with the store, and any node can ask for it with an injected store argument:

python
# agent.py — additions to Part 1's graph
from langgraph.store.base import BaseStore
from langgraph.store.memory import InMemoryStore
def recall(state, config, *, store: BaseStore):
user_id = config["configurable"]["user_id"]
items = store.search(("memories", user_id), limit=5)
facts = "\n".join(i.value["text"] for i in items)
system = f"Known facts about this user:\n{facts or 'None yet.'}"
return {"messages": [SystemMessage(content=system)]}
store = InMemoryStore()
app = graph.compile(checkpointer=memory, store=store) # both layers

The recall node runs first and prepends whatever the store knows to the conversation. Two threads, two different thread_ids — same facts. Your agent just stopped being a goldfish, at least until the process exits. InMemoryStore is a Python dict underneath; it exists so you can get the wiring right before paying for infrastructure.

Step 2 — Make Memory Searchable by Meaning

Key-value lookup breaks the moment memory grows past a handful of entries. Yesterday’s fact was stored as “deploys on Fly.io”; today’s question is “what cloud do I use?” — no keyword overlap, no match. Semantic search (comparing meaning via embeddings — numeric vectors that place similar sentences close together) fixes exactly this.

LangGraph’s Store has it built in. Pass an index config and store.search() grows a query parameter:

python
# semantic memory — one config change
from langchain.embeddings import init_embeddings
store = InMemoryStore(
index={
"embed": init_embeddings("openai:text-embedding-3-small"),
"dims": 1536,
}
)
# later, inside recall():
items = store.search(("memories", user_id),
query="what cloud does this user deploy to?",
limit=3)

One honest wrinkle: this series runs on Claude, and Anthropic doesn’t ship an embeddings API — so the index needs a second provider. I use OpenAI’s text-embedding-3-small because it’s cheap (I’ve never crossed a dollar a month on a side project); a local HuggingFace model works if you’d rather not add another key. In my tests the semantic lookup adds roughly 100–150ms per recall — noticeable in logs, invisible in chat.

Common mistake: dumping entire conversations into the store “to be safe.” Search quality collapses, because every query matches a wall of chat noise. Store distilled facts, one per entry — “prefers code answers”, not 40 raw messages. The store remembers; it shouldn’t transcribe.

Step 3 — Production Memory With Redis

InMemoryStore evaporates on every deploy — fine on a laptop, useless behind the FastAPI service from Part 2. Production AI agent memory has one requirement the demos skip: it must outlive the process. The drop-in fix is the Redis store, which implements the same BaseStore interface, so the recall node doesn’t change at all:

python
# production: same API, durable backend
from langgraph.store.redis import RedisStore
with RedisStore.from_conn_string("redis://localhost:6379") as store:
store.setup() # creates indices — run once
app = graph.compile(checkpointer=memory, store=store)
# serve FastAPI inside this context

Two things the README undersells. First, RedisStore supports the same vector-index config as Step 2, so semantic search comes along for free — you don’t need a separate vector database for memory at this scale. Second, Redis TTLs (time-to-live — automatic key expiry) give you memory forgetting almost for free, and forgetting matters: user preferences from eight months ago are as likely to mislead the agent as help it. I set a 90-day refresh-on-read TTL and let stale facts quietly fall away.

A free 30MB Redis Cloud instance has been more than enough for my agents — distilled facts are tiny. If you self-host next to Part 2’s Docker setup, one redis:7-alpine service in your compose file does it.

Testing It + Common Errors

The test that matters is the one from the intro, now in three commands: state a fact in thread A, ask for it back in thread B, then restart the process and ask again in thread C (new thread_ids, same user_id). Pass all three and your AI agent memory is real; fail the third and your store isn’t durable.

Errors I actually hit while building this:

  • TypeError: recall() missing 1 required keyword-only argument: 'store' — you compiled without store=. The injection only works when the graph knows about the store.
  • Semantic queries return nothing after switching embedding models — the index dims no longer match the stored vectors. Wipe and re-embed; vectors from different models aren’t comparable.
  • RedisStore works locally, empty in production — two services pointing at different Redis URLs. Log the connection string hash at startup; it’s a 10-second check that saved me an evening.

What to Build Next

The ceiling on this design is what gets stored — right now, you decide in code. The natural upgrade is letting the model itself extract facts worth keeping after each exchange: an LLM call that reads the turn and writes zero or more distilled memories. That’s the idea behind LangChain’s LangMem library, and after building the manual version you’ll recognise exactly what it automates — same Store underneath.

Then point this at Part 3: give the multi-agent team a shared namespace for verified findings, and a fact one worker checks today is free for every worker tomorrow. I’d build the memory-extraction step first, though — shared memory amplifies whatever you store, including junk.

Conclusion

Your agent now has both memory layers: a checkpointer for the conversation it’s having, and a searchable, durable store for everything it should carry between conversations. The rules that matter: store distilled facts rather than transcripts, namespace by user, search by meaning, and let old memories expire.

This also closes the loop the series opened — the What Are AI Agents? guide called memory one of the four pillars of an agent, and it was the last one we hadn’t built properly.

So, a specific question: what’s the first fact you’d want your agent to remember about you — and what’s one you’d explicitly want it to forget? The answers shape what Part 5 covers (evaluation and observability are the current front-runners).

Catch up on the series: Part 1 — Tools, StateGraph & Memory · Part 2 — FastAPI, Docker & Deploy · Part 3 — Multi-Agent Systems

Related: Deploy Your AI Agent to Cloud Run or Fly.io


Tags

#AIAgentMemory#AgenticAI#LangGraph#PythonTutorial#VectorSearch#AIForDevelopers

Share

Previous Article
Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)
More from this author

Sukhveer Kaur

Deploy AI Agent to Cloud Run or Fly.io (Python 2026)
Deploy AI Agent to Cloud Run or Fly.io (Python 2026)
June 11, 2026
6 min
Intermediate
See all by Sukhveer Kaur

Subscribe to our newsletter!

We'll send you the best of our blog just once a month. We promise.
Build an Agentic AI App in Python: AI Agent Memory (Part 4)
6 min left

Sukhveer Kaur

Software Developer & AI Engineer

Popular Posts

01
Deploy AI Agent to Cloud Run or Fly.io (Python 2026)
Artificial Intelligence
·
6 min read

Table Of Contents

1
Why Your Agent Still Forgets
2
Which AI Agent Memory Type Do You Need?
3
Step 1 — Add a Cross-Thread Memory Store
4
Step 2 — Make Memory Searchable by Meaning
5
Step 3 — Production Memory With Redis
6
Testing It + Common Errors
7
What to Build Next
8
Conclusion

Related Posts

© 2026, All Rights Reserved.

Quick Links

Advertise with usOur TeamContact Us

Social Media