HomeOur TeamContact
HomeTechnology & Innovation
Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)

Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)

Technology & Innovation
June 09, 2026
5 min read
Intermediate
📚 Part of the series: Agentic AI in Python: Zero to Production
Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)
Table of Contents
01
What We're Building
02
Prerequisites
03
Step 1 — Wrap the Agent in FastAPI
04
Step 2 — Write the Dockerfile
05
Step 3 — Deploy to a Cloud VPS
06
Testing It and Common Errors
07
What to Build Next
08
Conclusion

Series: Agentic AI in Python — Zero to Production This is Part 2. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tool use and SQLite memory → /build-agentic-ai-app-python-part-1/

If you’re starting here, you need a working agent.py and tools.py from Part 1 before continuing.

At the end of Part 1, the agent ran on your laptop. It answered questions, called tools, and remembered context across turns — but only you could use it. To make it useful, it needs an HTTP endpoint, a reproducible container, and a home on a server that doesn’t need your laptop to be open.

That’s exactly what Part 2 covers. By the end of this post you’ll have the same LangGraph agent running behind a FastAPI endpoint, packaged into a Docker image, and deployed to a cloud VPS. I’ll give you every command, every config file, and the three errors that cost me the most time the first time I did this.

What We’re Building

The local agent from Part 1 becomes a proper service: a POST endpoint at /chat that accepts a JSON body with a message and session_id, calls the LangGraph agent, and returns the reply as JSON. Docker packages the whole thing so it runs identically everywhere. A cloud VPS (I’m using a €4/month Hetzner CX11, but any Ubuntu server works) hosts it behind Nginx.

Production stack diagram showing how client HTTP requests flow through FastAPI into the LangGraph agent, which calls Claude LLM and Tavily tools, with Postgres handling memory, all running inside a Docker container on a cloud VPS with Nginx in front

The diagram shows the full runtime. The key change from Part 1 is the Postgres checkpointer replacing SQLite — SQLite is fine on a single machine but doesn’t survive container restarts gracefully. Postgres is one env var away.

Prerequisites

You need the code from Part 1 plus two new things before writing a single line:

  • Docker Desktop (or Docker Engine on Linux) — docker --version should return 24+. If it’s not installed, grab it from docs.docker.com/get-docker.
  • A cloud VPS with Ubuntu 22.04 — Hetzner, DigitalOcean, Linode, or any provider works. The €4/month CX11 (1 vCPU, 2 GB RAM) handles this agent comfortably. You’ll need SSH access and a public IP.
  • A Postgres instance — you can run docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres:15 locally for testing, or use a managed service (Supabase free tier works) for production.

Five-step deployment pipeline flowchart: write FastAPI wrapper, add Dockerfile, build and test locally with Docker, deploy to cloud VPS, add Nginx reverse proxy to serve on port 80

The flowchart is the exact sequence for this post. Steps 3–5 each take under ten minutes once steps 1–2 are solid.

Step 1 — Wrap the Agent in FastAPI

FastAPI is a natural fit here because it handles async, auto-generates API docs at /docs, and adds almost no overhead. The key design decision is how to pass session_id — it should come from the client, not be hardcoded, so different users get separate conversation histories without any extra code on your side.

Install the new dependencies:

bash
pip install fastapi uvicorn[standard] langgraph-checkpoint-postgres psycopg2-binary

Create api.py at your project root:

python
# api.py
from fastapi import FastAPI
from pydantic import BaseModel
from langchain_core.messages import HumanMessage
from agent import build_app # we'll refactor agent.py below
app = FastAPI(title="Agentic AI API")
class ChatRequest(BaseModel):
message: str
session_id: str = "default"
@app.post("/chat")
async def chat(req: ChatRequest):
agent = build_app()
config = {"configurable": {"thread_id": req.session_id}}
result = agent.invoke(
{"messages": [HumanMessage(content=req.message)]},
config=config
)
return {"reply": result["messages"][-1].content, "session_id": req.session_id}
@app.get("/health")
async def health():
return {"status": "ok"}

Now refactor agent.py slightly — move the graph compilation into a build_app() function so the checkpointer is initialised fresh on each worker startup rather than at module import time (which causes issues when uvicorn spins up multiple workers):

python
# agent.py — add this function, keep everything else the same
import os
from langgraph.checkpoint.postgres import PostgresSaver
import psycopg2
def build_app():
conn = psycopg2.connect(os.getenv("DATABASE_URL"))
memory = PostgresSaver(conn)
memory.setup() # creates the checkpoint tables if they don't exist
return graph.compile(checkpointer=memory)

The memory.setup() call is important — it creates the LangGraph checkpoint tables on first run. It’s idempotent, so calling it on every startup is safe and saves you a separate migration step.

Test it locally before touching Docker:

bash
DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" \
uvicorn api:app --reload

Then in another terminal:

bash
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is LangGraph?", "session_id": "test-123"}'

You should get a JSON response in 2–3 seconds with the agent’s reply. If you see {"reply": "..."}, step 1 is done.

Common mistake: Forgetting memory.setup() and getting psycopg2.errors.UndefinedTable on the first request. The fix is one line — always call memory.setup() before graph.compile().

Step 2 — Write the Dockerfile

The goal is a lean image that starts fast. I use python:3.11-slim as the base — it’s ~120 MB versus ~900 MB for the full Python image — and install only what’s needed.

dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install system deps for psycopg2
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq-dev gcc \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Generate your requirements.txt from the current environment:

bash
pip freeze > requirements.txt

Why --workers 2? A single uvicorn worker can handle this agent fine in dev, but two workers mean one can be processing a request while the other handles the next one coming in. For an LLM-backed endpoint with 2–3 second response times, this doubles effective throughput at no extra cost.

Build and test the image locally before pushing anything to a server:

bash
docker build -t ai-agent:latest .
docker run -p 8000:8000 \
--env-file .env \
-e DATABASE_URL="postgresql://postgres:secret@host.docker.internal:5432/postgres" \
ai-agent:latest

Note host.docker.internal — that’s the hostname Docker uses on Mac and Windows to reach your local machine from inside a container. On Linux, use 172.17.0.1 (the default Docker bridge gateway) instead.

Hit the /health endpoint to confirm the container is up:

bash
curl http://localhost:8000/health
# {"status":"ok"}

Common mistake: Building the image and getting ModuleNotFoundError: No module named 'psycopg2' even though it’s in requirements.txt. This happens because psycopg2-binary and psycopg2 are different packages — make sure your requirements.txt lists psycopg2-binary, not just psycopg2, unless you have libpq-dev installed (which the Dockerfile above does).

Step 3 — Deploy to a Cloud VPS

The fastest deployment path for a single container is to save the image to a tarball, copy it to the server, and run it there. No container registry needed.

On your local machine:

bash
# Save the image to a file
docker save ai-agent:latest | gzip > ai-agent.tar.gz
# Copy to your server (replace YOUR_SERVER_IP)
scp ai-agent.tar.gz root@YOUR_SERVER_IP:/opt/ai-agent/
# Copy your production env file
scp .env.prod root@YOUR_SERVER_IP:/opt/ai-agent/.env

On the server (SSH in first: ssh root@YOUR_SERVER_IP):

bash
# Install Docker if it's not there yet
curl -fsSL https://get.docker.com | sh
# Load the image from the tarball
cd /opt/ai-agent
docker load < ai-agent.tar.gz
# Run the container in the background
docker run -d \
--name ai-agent \
--restart unless-stopped \
-p 8000:8000 \
--env-file .env \
ai-agent:latest

--restart unless-stopped means the container comes back automatically if the server reboots — a small thing that saves you a panicked SSH session at 2 AM.

Now add Nginx as a reverse proxy so the agent is accessible on port 80 without needing :8000 in every URL:

bash
apt-get install -y nginx
cat > /etc/nginx/sites-available/ai-agent << 'EOF'
server {
listen 80;
server_name _;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 60s;
}
}
EOF
ln -s /etc/nginx/sites-available/ai-agent /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx

proxy_read_timeout 60s is critical. The default Nginx timeout is 60 seconds, but I’ve bumped it here explicitly because LangGraph agents occasionally take 8–12 seconds on multi-tool calls. Without this, Nginx closes the connection and the client gets a 504 while the agent is still happily processing in the background.

Test from your local machine:

bash
curl -X POST http://YOUR_SERVER_IP/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the Model Context Protocol?", "session_id": "prod-test-1"}'

If you get a reply, you’re live.

Testing It and Common Errors

Three things to verify before calling it done:

  1. Memory persists across container restarts — stop and restart the container (docker restart ai-agent), then send a follow-up question that references a previous answer. The agent should remember it, because state is in Postgres rather than in-process.
  2. Parallel sessions work — send two requests simultaneously with different session_id values. Both should get independent replies without bleeding context between them.
  3. The /docs page is accessible — visit http://YOUR_SERVER_IP/docs in a browser. FastAPI auto-generates this. It’s a quick sanity check that the app is up and the schema is correct.

Errors I actually hit during this deployment:

  • connection refused on the Postgres URL inside Docker — I forgot that localhost inside a container refers to the container itself, not the host. Fix: use host.docker.internal (Mac/Windows) or the host’s bridge IP on Linux.
  • uvicorn.error: [Errno 13] Permission denied on port 80 — FastAPI running as a non-root user can’t bind to ports under 1024. Fix: run FastAPI on port 8000 and let Nginx handle 80. (The Dockerfile above already does this.)
  • SQLite import error after switching to Postgres — I had a stale from langgraph.checkpoint.sqlite import SqliteSaver import in agent.py. Remove all SQLite imports once you’ve migrated to Postgres or the module will fail to load.

What to Build Next

The deployment you have now handles multiple users and survives restarts. Here’s where I’d invest next, in order of impact:

Add a streaming endpoint — right now the client waits 2–4 seconds for the full reply. FastAPI supports Server-Sent Events and LangGraph supports astream_events(). Wiring these together cuts perceived latency to near-zero by streaming tokens as they arrive — this is the single biggest UX improvement you can make with one afternoon of work.

Add human_in_the_loop approval — LangGraph’s interrupt_before parameter lets you pause the graph before a tool runs and wait for a human to approve. Expose this as a /approve endpoint. Now you have a supervised agent that can handle sensitive operations (sending emails, modifying databases) with a human in the loop.

Swap the VPS for a managed container service — the docker run approach works but requires you to SSH in to redeploy. Cloud Run (GCP), App Runner (AWS), or Fly.io let you push an image and get a URL back. I’ll cover this in Part 3.

Conclusion

The agent from Part 1 is now a proper service — running in a container, persistent across restarts, and accessible from anywhere. The FastAPI + LangGraph + Postgres stack is what I run in production for the systems I’ve actually shipped, and the only significant difference between what you’ve just built and those systems is observability (logging, tracing) and a smoother CI/CD pipeline.

Building the HTTP layer and the Dockerfile took me about 2 hours the first time. With the patterns above, it should take you under one.

What are you going to use this agent for? If you’re building something where human approval before tool execution matters — customer-facing agents, anything that writes to a database — tell me in the comments and I’ll make that the focus of Part 3.

Related: What Are AI Agents? Complete Guide for Developers (2026)


Tags

#AgenticAI#FastAPI#Docker#LangGraph#PythonTutorial#AIForDevelopers

Share

Previous Article
How to Crack a Google Interview in India (2026 Guide)
More from this author

Sukhveer Kaur

How to Crack a Google Interview in India (2026 Guide)
How to Crack a Google Interview in India (2026 Guide)
June 09, 2026
6 min
Intermediate
See all by Sukhveer Kaur

Subscribe to our newsletter!

We'll send you the best of our blog just once a month. We promise.
Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)
5 min left

Sukhveer Kaur

Software Developer & AI Engineer

Popular Posts

01
Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)
Technology & Innovation
·
5 min read

Table Of Contents

1
What We're Building
2
Prerequisites
3
Step 1 — Wrap the Agent in FastAPI
4
Step 2 — Write the Dockerfile
5
Step 3 — Deploy to a Cloud VPS
6
Testing It and Common Errors
7
What to Build Next
8
Conclusion

Related Posts

© 2026, All Rights Reserved.

Quick Links

Advertise with usOur TeamContact Us

Social Media