Agentic AI in Python: Zero to Production · 04Intermediate

Build an Agentic AI App in Python: AI Agent Memory (Part 4)

Add real AI agent memory in Python — a LangGraph Store that recalls users across threads, semantic search with embeddings, and Redis for production.

SK

Sukhveer Kaur

Published June 12, 2026 · Updated July 6, 2026

7 min read

Open in ChatGPT Open in Claude

On this page +

Why Your Agent Still Forgets Which AI Agent Memory Type Do You Need?Step 1 — Add a Cross-Thread Memory Store Step 2 — Make Memory Searchable by Meaning Step 3 — Production Memory With Redis Testing It + Common Errors What to Build Next Conclusion

🧰 New here? Set up your environment first · ~5 min

Install Python 3.11+ — confirm with python3 --version.
Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
Install the packages this tutorial lists: pip install -U pip <packages>.
Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

🟡 Intermediate⏱️ 25 minStack: Python 3.11+, LangGraph, Redis, an embeddings model

Series: Agentic AI in Python — Zero to Production This is Part 4 — AI agent memory. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tools and a SQLite checkpointer → /build-agentic-ai-app-python-part-1/ — Part 2: Wrapped it in FastAPI, Dockerised it, and deployed it → /build-agentic-ai-app-python-part-2/ — Part 3: Scaled to a supervisor + workers multi-agent team → /build-agentic-ai-app-python-part-3/

If you’re starting here, you only need Part 1’s agent.py — everything in this post is a small diff on top of it.

AI agent memory is where the agent you built in Part 1 quietly fails. Try this test: open a chat and tell it “I’m Sukhveer, I deploy on Fly.io, and I prefer answers with code.” Have a great conversation. Now start a new thread and ask “where do I deploy?” — total blank. The checkpointer didn’t break; it was never designed for this.

What’s missing is long-term memory — the kind that follows a user across conversations instead of dying with the thread. It’s also the single most requested topic after Part 3, which is why this part exists.

By the end of this post your agent will remember facts about users across threads, retrieve them by meaning rather than exact keywords, and keep all of it in a Redis store that survives restarts and redeploys. Three steps, each a few lines. First, let’s be precise about why the agent forgets.

✅ Before you start

A working agent.py from Part 1 — this post is a small diff on top of it
You understand short-term (thread) memory — the from-scratch memory post builds it by hand first
An LLM API key, plus an embeddings provider (e.g. OpenAI) since Claude has no embeddings API — new to embeddings? Read the embeddings and vector search primer

🎯 Key takeaways

Checkpointers give a thread memory; they don’t give the agent memory. Cross-conversation recall needs a separate Store.
Compile the graph with both a checkpointer= and a store=, and namespace the store by user ID so memory is multi-user safe.
Store distilled facts (one per entry), not transcripts, and search by meaning with embeddings — yesterday’s wording rarely matches today’s question.
For production, swap InMemoryStore for a Redis store (same BaseStore API) so memory outlives the process; TTLs give you forgetting for free.

Why Your Agent Still Forgets#

The confusion starts because “memory” means two different things, and most tutorials only give you one. A checkpointer (the SQLite saver from Part 1) records the message history of a single thread — kill the thread, and everything it learned about the user is unreachable. Checkpointers give your threads memory. They don’t give your agent memory.

The diagram shows the two lanes side by side. The top lane is what you already have: short-term, thread-scoped state. The bottom lane is what we’re adding — a LangGraph Store, a key-value memory with optional vector search that every thread can read and write. LangGraph treats these as separate components on purpose: you compile your graph with both a checkpointer= and a store=, and they never touch each other’s data.

When I finally understood this split, my reaction was mild annoyance — I’d spent a weekend trying to make checkpointers do cross-thread recall by reusing thread IDs. Don’t do that. It bloats one thread’s history until every call drags the entire past through the model, and my token bill noticed before I did.

Which AI Agent Memory Type Do You Need?#

Before writing code, decide what the agent actually needs to remember — because each type has a different cheapest-correct implementation. Run through this checklist:

“Remember this conversation” → thread memory. You already have it: the Part 1 checkpointer. Stop here if that’s all you need.
“Remember facts about me across chats” (name, stack, preferences) → cross-thread key-value memory. Step 1 below.
“Recall relevant past context, even when worded differently” → semantic memory (vector search over stored facts). Step 2.
“Remember after a restart, in production” → a durable store backend. Step 3.

The flowchart is the whole build plan. Prerequisites are light: the Part 1 project, Python 3.11+, and pip install -U langgraph langgraph-checkpoint-redis (LangGraph 1.2.4 and langgraph-checkpoint-redis 0.4.1 were current when I wrote this in June 2026). For Step 2 you’ll also want an embedding model — more on that there. If you don’t have the Part 1 code, build it first; it’s a 30-minute read.

Step 1 — Add a Cross-Thread Memory Store#

A Store in LangGraph is deliberately boring: namespaces, keys, and JSON-ish values. The namespace is a tuple, and putting the user ID in the namespace is what makes memory multi-user safe — every user gets their own shelf.

python

# memory_demo.py — the Store API in 10 lines
from langgraph.store.memory import InMemoryStore
 
store = InMemoryStore()
ns = ("memories", "user-sukhveer")          # one shelf per user
 
store.put(ns, "deploy-target", {"text": "Deploys agents on Fly.io"})
store.put(ns, "style", {"text": "Prefers answers with code"})
 
item = store.get(ns, "deploy-target")
print(item.value["text"])                   # Deploys agents on Fly.io

That’s the entire mental model. Now wire it into the agent. Compile the graph with the store, and any node can ask for it with an injected store argument:

python

# agent.py — additions to Part 1's graph
from langgraph.store.base import BaseStore
from langgraph.store.memory import InMemoryStore
 
def recall(state, config, *, store: BaseStore):
    user_id = config["configurable"]["user_id"]
    items = store.search(("memories", user_id), limit=5)
    facts = "\n".join(i.value["text"] for i in items)
    system = f"Known facts about this user:\n{facts or 'None yet.'}"
    return {"messages": [SystemMessage(content=system)]}
 
store = InMemoryStore()
app = graph.compile(checkpointer=memory, store=store)  # both layers

The recall node runs first and prepends whatever the store knows to the conversation. Two threads, two different thread_ids — same facts. Your agent just stopped being a goldfish, at least until the process exits. InMemoryStore is a Python dict underneath; it exists so you can get the wiring right before paying for infrastructure.

Step 2 — Make Memory Searchable by Meaning#

Key-value lookup breaks the moment memory grows past a handful of entries. Yesterday’s fact was stored as “deploys on Fly.io”; today’s question is “what cloud do I use?” — no keyword overlap, no match. Semantic search (comparing meaning via embeddings — numeric vectors that place similar sentences close together) fixes exactly this.

LangGraph’s Store has it built in. Pass an index config and store.search() grows a query parameter:

python

# semantic memory — one config change
from langchain.embeddings import init_embeddings
 
store = InMemoryStore(
    index={
        "embed": init_embeddings("openai:text-embedding-3-small"),
        "dims": 1536,
    }
)
 
# later, inside recall():
items = store.search(("memories", user_id),
                     query="what cloud does this user deploy to?",
                     limit=3)

One honest wrinkle: this series runs on Claude, and Anthropic doesn’t ship an embeddings API — so the index needs a second provider. I use OpenAI’s text-embedding-3-small because it’s cheap (I’ve never crossed a dollar a month on a side project); a local HuggingFace model works if you’d rather not add another key. In my tests the semantic lookup adds roughly 100–150ms per recall — noticeable in logs, invisible in chat.

Common mistake: dumping entire conversations into the store “to be safe.” Search quality collapses, because every query matches a wall of chat noise. Store distilled facts, one per entry — “prefers code answers”, not 40 raw messages. The store remembers; it shouldn’t transcribe.

Step 3 — Production Memory With Redis#

InMemoryStore evaporates on every deploy — fine on a laptop, useless behind the FastAPI service from Part 2. Production AI agent memory has one requirement the demos skip: it must outlive the process. The drop-in fix is the Redis store, which implements the same BaseStore interface, so the recall node doesn’t change at all:

python

# production: same API, durable backend
from langgraph.store.redis import RedisStore
 
with RedisStore.from_conn_string("redis://localhost:6379") as store:
    store.setup()                       # creates indices — run once
    app = graph.compile(checkpointer=memory, store=store)
    # serve FastAPI inside this context

Two things the README undersells. First, RedisStore supports the same vector-index config as Step 2, so semantic search comes along for free — you don’t need a separate vector database for memory at this scale. Second, Redis TTLs (time-to-live — automatic key expiry) give you memory forgetting almost for free, and forgetting matters: user preferences from eight months ago are as likely to mislead the agent as help it. I set a 90-day refresh-on-read TTL and let stale facts quietly fall away.

A free 30MB Redis Cloud instance has been more than enough for my agents — distilled facts are tiny. If you self-host next to Part 2’s Docker setup, one redis:7-alpine service in your compose file does it.

🔑 Key point

Short-term memory (thread state) and long-term memory (a cross-thread store) solve different problems. Reaching for a vector DB when you only needed conversation state is a common over-build — match the memory type to the need.

Testing It + Common Errors#

The test that matters is the one from the intro, now in three commands: state a fact in thread A, ask for it back in thread B, then restart the process and ask again in thread C (new thread_ids, same user_id). Pass all three and your AI agent memory is real; fail the third and your store isn’t durable.

Errors I actually hit while building this:

TypeError: recall() missing 1 required keyword-only argument: 'store' — you compiled without store=. The injection only works when the graph knows about the store.
Semantic queries return nothing after switching embedding models — the index dims no longer match the stored vectors. Wipe and re-embed; vectors from different models aren’t comparable.
RedisStore works locally, empty in production — two services pointing at different Redis URLs. Log the connection string hash at startup; it’s a 10-second check that saved me an evening.

What to Build Next#

The ceiling on this design is what gets stored — right now, you decide in code. The natural upgrade is letting the model itself extract facts worth keeping after each exchange: an LLM call that reads the turn and writes zero or more distilled memories. That’s the idea behind LangChain’s LangMem library, and after building the manual version you’ll recognise exactly what it automates — same Store underneath.

Then point this at Part 3: give the multi-agent team a shared namespace for verified findings, and a fact one worker checks today is free for every worker tomorrow. I’d build the memory-extraction step first, though — shared memory amplifies whatever you store, including junk.

Conclusion#

Your agent now has both memory layers: a checkpointer for the conversation it’s having, and a searchable, durable store for everything it should carry between conversations. The rules that matter: store distilled facts rather than transcripts, namespace by user, search by meaning, and let old memories expire.

This also closes the loop the series opened — the What Are AI Agents? guide called memory one of the four pillars of an agent, and it was the last one we hadn’t built properly.

So, a specific question: what’s the first fact you’d want your agent to remember about you — and what’s one you’d explicitly want it to forget? The answers shape what Part 5 covers (evaluation and observability are the current front-runners).

The full series — Agentic AI in Python: Zero to Production:

Part 1 — Tools, StateGraph & Memory
Part 2 — FastAPI, Docker & Deploy
Part 3 — Multi-Agent Systems
Part 4 — AI Agent Memory — you’re here
Part 5 — MCP Client & Real Tools
Part 6 — Observability & Evals

🧭 Where to go from here

Need the base agent? Part 1 builds the agent this memory plugs into.
Next in this series: Part 5 — MCP client and real tools.
Want the concept first? The from-scratch memory post builds short-term memory by hand.

Frequently asked questions

What's the difference between a checkpointer and a Store? +

A checkpointer records one thread's message history (short-term). A Store is cross-thread key-value memory the whole agent can read and write (long-term). You use both, compiled into the same graph.

Why isn't my agent remembering facts across conversations? +

You're relying on the checkpointer, which is thread-scoped. Add a LangGraph Store and namespace it by user_id so facts follow the user across threads.

Do I need a vector database for agent memory? +

Not at this scale. The LangGraph Store has built-in semantic search, and the Redis store supports the same vector index, so memory and embeddings live in one place.

Claude has no embeddings API — what do I use? +

Add a second provider for embeddings (e.g. OpenAI text-embedding-3-small) or a local model. Only the index needs it; the agent still runs on Claude.

References

#AIAgentMemory #AgenticAI #LangGraph #PythonTutorial #VectorSearch #AIForDevelopers

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Intermediate · 1 minAgentic AI in Python: Zero to Production — The Full SeriesSukhveer Kaur · Jun 20, 2026 Intermediate · 6 minLangGraph vs CrewAI vs AutoGen: Which to Use in 2026?Sukhveer Kaur · Jun 15, 2026 Comparison · 6 minPydantic AI vs LangChain: Which Framework Should You Use? (2026)Sukhveer Kaur · Jul 6, 2026

More by Sukhveer Kaur

Guide · 8 minEvaluate an AI Agent on a Local LLM: Free, No API Key (2026)Sukhveer Kaur · Jul 18, 2026 Guide · 9 minAI Agent Guardrails in Python: Input & Output ValidationSukhveer Kaur · Jul 6, 2026 Comparison · 6 minAgentic Search vs RAG: Which One Do You Actually Need? (2026)Sukhveer Kaur · Jul 6, 2026

Continue the series

← Part 03

Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)

Part 05 →

Build an Agentic AI App in Python: MCP Client (Part 5)

Get the next part the day it lands

One email per new part. No digest spam.