Series: Agentic AI in Python — Zero to Production This is Part 2. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tool use and SQLite memory → /build-agentic-ai-app-python-part-1/
If you’re starting here, you need a working
agent.pyandtools.pyfrom Part 1 before continuing.
At the end of Part 1, the agent ran on your laptop. It answered questions, called tools, and remembered context across turns — but only you could use it. To make it useful, it needs an HTTP endpoint, a reproducible container, and a home on a server that doesn’t need your laptop to be open.
That’s exactly what Part 2 covers. By the end of this post you’ll have the same LangGraph agent running behind a FastAPI endpoint, packaged into a Docker image, and deployed to a cloud VPS. I’ll give you every command, every config file, and the three errors that cost me the most time the first time I did this.
What We’re Building
The local agent from Part 1 becomes a proper service: a POST endpoint at /chat that accepts a JSON body with a message and session_id, calls the LangGraph agent, and returns the reply as JSON. Docker packages the whole thing so it runs identically everywhere. A cloud VPS (I’m using a €4/month Hetzner CX11, but any Ubuntu server works) hosts it behind Nginx.
The diagram shows the full runtime. The key change from Part 1 is the Postgres checkpointer replacing SQLite — SQLite is fine on a single machine but doesn’t survive container restarts gracefully. Postgres is one env var away.
Prerequisites
You need the code from Part 1 plus two new things before writing a single line:
- Docker Desktop (or Docker Engine on Linux) —
docker --versionshould return24+. If it’s not installed, grab it from docs.docker.com/get-docker. - A cloud VPS with Ubuntu 22.04 — Hetzner, DigitalOcean, Linode, or any provider works. The €4/month CX11 (1 vCPU, 2 GB RAM) handles this agent comfortably. You’ll need SSH access and a public IP.
- A Postgres instance — you can run
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres:15locally for testing, or use a managed service (Supabase free tier works) for production.
The flowchart is the exact sequence for this post. Steps 3–5 each take under ten minutes once steps 1–2 are solid.
Step 1 — Wrap the Agent in FastAPI
FastAPI is a natural fit here because it handles async, auto-generates API docs at /docs, and adds almost no overhead. The key design decision is how to pass session_id — it should come from the client, not be hardcoded, so different users get separate conversation histories without any extra code on your side.
Install the new dependencies:
pip install fastapi uvicorn[standard] langgraph-checkpoint-postgres psycopg2-binary
Create api.py at your project root:
# api.pyfrom fastapi import FastAPIfrom pydantic import BaseModelfrom langchain_core.messages import HumanMessagefrom agent import build_app # we'll refactor agent.py belowapp = FastAPI(title="Agentic AI API")class ChatRequest(BaseModel):message: strsession_id: str = "default"@app.post("/chat")async def chat(req: ChatRequest):agent = build_app()config = {"configurable": {"thread_id": req.session_id}}result = agent.invoke({"messages": [HumanMessage(content=req.message)]},config=config)return {"reply": result["messages"][-1].content, "session_id": req.session_id}@app.get("/health")async def health():return {"status": "ok"}
Now refactor agent.py slightly — move the graph compilation into a build_app() function so the checkpointer is initialised fresh on each worker startup rather than at module import time (which causes issues when uvicorn spins up multiple workers):
# agent.py — add this function, keep everything else the sameimport osfrom langgraph.checkpoint.postgres import PostgresSaverimport psycopg2def build_app():conn = psycopg2.connect(os.getenv("DATABASE_URL"))memory = PostgresSaver(conn)memory.setup() # creates the checkpoint tables if they don't existreturn graph.compile(checkpointer=memory)
The memory.setup() call is important — it creates the LangGraph checkpoint tables on first run. It’s idempotent, so calling it on every startup is safe and saves you a separate migration step.
Test it locally before touching Docker:
DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" \uvicorn api:app --reload
Then in another terminal:
curl -X POST http://localhost:8000/chat \-H "Content-Type: application/json" \-d '{"message": "What is LangGraph?", "session_id": "test-123"}'
You should get a JSON response in 2–3 seconds with the agent’s reply. If you see {"reply": "..."}, step 1 is done.
Common mistake: Forgetting
memory.setup()and gettingpsycopg2.errors.UndefinedTableon the first request. The fix is one line — always callmemory.setup()beforegraph.compile().
Step 2 — Write the Dockerfile
The goal is a lean image that starts fast. I use python:3.11-slim as the base — it’s ~120 MB versus ~900 MB for the full Python image — and install only what’s needed.
# DockerfileFROM python:3.11-slimWORKDIR /app# Install system deps for psycopg2RUN apt-get update && apt-get install -y --no-install-recommends \libpq-dev gcc \&& rm -rf /var/lib/apt/lists/*COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txtCOPY . .EXPOSE 8000CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
Generate your requirements.txt from the current environment:
pip freeze > requirements.txt
Why --workers 2? A single uvicorn worker can handle this agent fine in dev, but two workers mean one can be processing a request while the other handles the next one coming in. For an LLM-backed endpoint with 2–3 second response times, this doubles effective throughput at no extra cost.
Build and test the image locally before pushing anything to a server:
docker build -t ai-agent:latest .docker run -p 8000:8000 \--env-file .env \-e DATABASE_URL="postgresql://postgres:secret@host.docker.internal:5432/postgres" \ai-agent:latest
Note host.docker.internal — that’s the hostname Docker uses on Mac and Windows to reach your local machine from inside a container. On Linux, use 172.17.0.1 (the default Docker bridge gateway) instead.
Hit the /health endpoint to confirm the container is up:
curl http://localhost:8000/health# {"status":"ok"}
Common mistake: Building the image and getting
ModuleNotFoundError: No module named 'psycopg2'even though it’s inrequirements.txt. This happens becausepsycopg2-binaryandpsycopg2are different packages — make sure yourrequirements.txtlistspsycopg2-binary, not justpsycopg2, unless you havelibpq-devinstalled (which the Dockerfile above does).
Step 3 — Deploy to a Cloud VPS
The fastest deployment path for a single container is to save the image to a tarball, copy it to the server, and run it there. No container registry needed.
On your local machine:
# Save the image to a filedocker save ai-agent:latest | gzip > ai-agent.tar.gz# Copy to your server (replace YOUR_SERVER_IP)scp ai-agent.tar.gz root@YOUR_SERVER_IP:/opt/ai-agent/# Copy your production env filescp .env.prod root@YOUR_SERVER_IP:/opt/ai-agent/.env
On the server (SSH in first: ssh root@YOUR_SERVER_IP):
# Install Docker if it's not there yetcurl -fsSL https://get.docker.com | sh# Load the image from the tarballcd /opt/ai-agentdocker load < ai-agent.tar.gz# Run the container in the backgrounddocker run -d \--name ai-agent \--restart unless-stopped \-p 8000:8000 \--env-file .env \ai-agent:latest
--restart unless-stopped means the container comes back automatically if the server reboots — a small thing that saves you a panicked SSH session at 2 AM.
Now add Nginx as a reverse proxy so the agent is accessible on port 80 without needing :8000 in every URL:
apt-get install -y nginxcat > /etc/nginx/sites-available/ai-agent << 'EOF'server {listen 80;server_name _;location / {proxy_pass http://localhost:8000;proxy_set_header Host $host;proxy_set_header X-Real-IP $remote_addr;proxy_read_timeout 60s;}}EOFln -s /etc/nginx/sites-available/ai-agent /etc/nginx/sites-enabled/nginx -t && systemctl reload nginx
proxy_read_timeout 60s is critical. The default Nginx timeout is 60 seconds, but I’ve bumped it here explicitly because LangGraph agents occasionally take 8–12 seconds on multi-tool calls. Without this, Nginx closes the connection and the client gets a 504 while the agent is still happily processing in the background.
Test from your local machine:
curl -X POST http://YOUR_SERVER_IP/chat \-H "Content-Type: application/json" \-d '{"message": "What is the Model Context Protocol?", "session_id": "prod-test-1"}'
If you get a reply, you’re live.
Testing It and Common Errors
Three things to verify before calling it done:
- Memory persists across container restarts — stop and restart the container (
docker restart ai-agent), then send a follow-up question that references a previous answer. The agent should remember it, because state is in Postgres rather than in-process. - Parallel sessions work — send two requests simultaneously with different
session_idvalues. Both should get independent replies without bleeding context between them. - The
/docspage is accessible — visithttp://YOUR_SERVER_IP/docsin a browser. FastAPI auto-generates this. It’s a quick sanity check that the app is up and the schema is correct.
Errors I actually hit during this deployment:
connection refusedon the Postgres URL inside Docker — I forgot thatlocalhostinside a container refers to the container itself, not the host. Fix: usehost.docker.internal(Mac/Windows) or the host’s bridge IP on Linux.uvicorn.error: [Errno 13] Permission deniedon port 80 — FastAPI running as a non-root user can’t bind to ports under 1024. Fix: run FastAPI on port 8000 and let Nginx handle 80. (The Dockerfile above already does this.)- SQLite import error after switching to Postgres — I had a stale
from langgraph.checkpoint.sqlite import SqliteSaverimport inagent.py. Remove all SQLite imports once you’ve migrated to Postgres or the module will fail to load.
What to Build Next
The deployment you have now handles multiple users and survives restarts. Here’s where I’d invest next, in order of impact:
Add a streaming endpoint — right now the client waits 2–4 seconds for the full reply. FastAPI supports Server-Sent Events and LangGraph supports astream_events(). Wiring these together cuts perceived latency to near-zero by streaming tokens as they arrive — this is the single biggest UX improvement you can make with one afternoon of work.
Add human_in_the_loop approval — LangGraph’s interrupt_before parameter lets you pause the graph before a tool runs and wait for a human to approve. Expose this as a /approve endpoint. Now you have a supervised agent that can handle sensitive operations (sending emails, modifying databases) with a human in the loop.
Swap the VPS for a managed container service — the docker run approach works but requires you to SSH in to redeploy. Cloud Run (GCP), App Runner (AWS), or Fly.io let you push an image and get a URL back. I’ll cover this in Part 3.
Conclusion
The agent from Part 1 is now a proper service — running in a container, persistent across restarts, and accessible from anywhere. The FastAPI + LangGraph + Postgres stack is what I run in production for the systems I’ve actually shipped, and the only significant difference between what you’ve just built and those systems is observability (logging, tracing) and a smoother CI/CD pipeline.
Building the HTTP layer and the Dockerfile took me about 2 hours the first time. With the patterns above, it should take you under one.
What are you going to use this agent for? If you’re building something where human approval before tool execution matters — customer-facing agents, anything that writes to a database — tell me in the comments and I’ll make that the focus of Part 3.
Related: What Are AI Agents? Complete Guide for Developers (2026)






