Agentic AI in Python: Zero to Production · 02Intermediate

Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)

Wrap your LangGraph agent in FastAPI, Dockerize it, and deploy to a cloud VPS. Part 2 of the Agentic AI Python series — zero to production.

SK

Sukhveer Kaur

Published June 9, 2026 · Updated July 6, 2026

9 min read

Open in ChatGPT Open in Claude

On this page +

What We're Building Prerequisites Step 1 — Wrap the Agent in FastAPI Step 2 — Write the Dockerfile Step 3 — Deploy to a Cloud VPS Testing It and Common Errors What to Build Next Conclusion

🧰 New here? Set up your environment first · ~5 min

Install Python 3.11+ — confirm with python3 --version.
Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
Install the packages this tutorial lists: pip install -U pip <packages>.
Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

🟡 Intermediate⏱️ 35 minStack: Python 3.11+, FastAPI, Docker, a cloud VPS

Series: Agentic AI in Python — Zero to Production This is Part 2. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tool use and SQLite memory → /build-agentic-ai-app-python-part-1/

If you’re starting here, you need a working agent.py and tools.py from Part 1 before continuing.

At the end of Part 1, the agent ran on your laptop. It answered questions, called tools, and remembered context across turns — but only you could use it. To make it useful, it needs an HTTP endpoint, a reproducible container, and a home on a server that doesn’t need your laptop to be open.

That’s exactly what Part 2 covers. By the end of this post you’ll have the same LangGraph agent running behind a FastAPI endpoint, packaged into a Docker image, and deployed to a cloud VPS. I’ll give you every command, every config file, and the three errors that cost me the most time the first time I did this.

✅ Before you start

A working agent.py and tools.py from Part 1 — this post deploys that exact agent
Comfortable with the terminal and SSH; Docker installed locally — new to containers? Read the Docker for Python primer first
A cloud VPS you can SSH into (any Ubuntu server works)

🎯 Key takeaways

Wrap the agent in a FastAPI POST /chat endpoint and take session_id from the client so each user gets a separate conversation history for free.
Swap SQLite for a Postgres checkpointer so memory survives container restarts, and call memory.setup() before graph.compile().
Containerise with python:3.11-slim and run uvicorn with --workers 2, then put Nginx in front so the agent serves on port 80.
Raise proxy_read_timeout — multi-tool calls can take 8–12s, and the default closes the connection and returns a 504 while the agent is still working.

What We’re Building#

The local agent from Part 1 becomes a proper service: a POST endpoint at /chat that accepts a JSON body with a message and session_id, calls the LangGraph agent, and returns the reply as JSON. Docker packages the whole thing so it runs identically everywhere. A cloud VPS (I’m using a €4/month Hetzner CX11, but any Ubuntu server works) hosts it behind Nginx.

The diagram shows the full runtime. The key change from Part 1 is the Postgres checkpointer replacing SQLite — SQLite is fine on a single machine but doesn’t survive container restarts gracefully. Postgres is one env var away.

Prerequisites#

You need the code from Part 1 plus two new things before writing a single line:

Docker Desktop (or Docker Engine on Linux) — docker --version should return 24+. If it’s not installed, grab it from docs.docker.com/get-docker.
A cloud VPS with Ubuntu 22.04 — Hetzner, DigitalOcean, Linode, or any provider works. The €4/month CX11 (1 vCPU, 2 GB RAM) handles this agent comfortably. You’ll need SSH access and a public IP.
A Postgres instance — you can run docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres:15 locally for testing, or use a managed service (Supabase free tier works) for production.

The flowchart is the exact sequence for this post. Steps 3–5 each take under ten minutes once steps 1–2 are solid.

Step 1 — Wrap the Agent in FastAPI#

FastAPI is a natural fit here because it handles async, auto-generates API docs at /docs, and adds almost no overhead. The key design decision is how to pass session_id — it should come from the client, not be hardcoded, so different users get separate conversation histories without any extra code on your side.

Install the new dependencies:

bash

pip install fastapi uvicorn[standard] langgraph-checkpoint-postgres psycopg2-binary

Create api.py at your project root:

python

# api.py
from fastapi import FastAPI
from pydantic import BaseModel
from langchain_core.messages import HumanMessage
from agent import build_app   # we'll refactor agent.py below
 
app = FastAPI(title="Agentic AI API")
 
class ChatRequest(BaseModel):
    message: str
    session_id: str = "default"
 
@app.post("/chat")
async def chat(req: ChatRequest):
    agent = build_app()
    config = {"configurable": {"thread_id": req.session_id}}
    result = agent.invoke(
        {"messages": [HumanMessage(content=req.message)]},
        config=config
    )
    return {"reply": result["messages"][-1].content, "session_id": req.session_id}
 
@app.get("/health")
async def health():
    return {"status": "ok"}

Now refactor agent.py slightly — move the graph compilation into a build_app() function so the checkpointer is initialised fresh on each worker startup rather than at module import time (which causes issues when uvicorn spins up multiple workers):

python

# agent.py — add this function, keep everything else the same
import os
from langgraph.checkpoint.postgres import PostgresSaver
import psycopg2
 
def build_app():
    conn = psycopg2.connect(os.getenv("DATABASE_URL"))
    memory = PostgresSaver(conn)
    memory.setup()   # creates the checkpoint tables if they don't exist
    return graph.compile(checkpointer=memory)

The memory.setup() call is important — it creates the LangGraph checkpoint tables on first run. It’s idempotent, so calling it on every startup is safe and saves you a separate migration step.

Test it locally before touching Docker:

bash

DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" \
  uvicorn api:app --reload

Then in another terminal:

bash

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is LangGraph?", "session_id": "test-123"}'

You should get a JSON response in 2–3 seconds with the agent’s reply. If you see {"reply": "..."}, step 1 is done.

Common mistake: Forgetting memory.setup() and getting psycopg2.errors.UndefinedTable on the first request. The fix is one line — always call memory.setup() before graph.compile().

Step 2 — Write the Dockerfile#

The goal is a lean image that starts fast. I use python:3.11-slim as the base — it’s ~120 MB versus ~900 MB for the full Python image — and install only what’s needed.

dockerfile

# Dockerfile
FROM python:3.11-slim
 
WORKDIR /app
 
# Install system deps for psycopg2
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq-dev gcc \
    && rm -rf /var/lib/apt/lists/*
 
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
COPY . .
 
EXPOSE 8000
 
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Generate your requirements.txt from the current environment:

bash

pip freeze > requirements.txt

Why --workers 2? A single uvicorn worker can handle this agent fine in dev, but two workers mean one can be processing a request while the other handles the next one coming in. For an LLM-backed endpoint with 2–3 second response times, this doubles effective throughput at no extra cost.

Build and test the image locally before pushing anything to a server:

bash

docker build -t ai-agent:latest .
 
docker run -p 8000:8000 \
  --env-file .env \
  -e DATABASE_URL="postgresql://postgres:secret@host.docker.internal:5432/postgres" \
  ai-agent:latest

Note host.docker.internal — that’s the hostname Docker uses on Mac and Windows to reach your local machine from inside a container. On Linux, use 172.17.0.1 (the default Docker bridge gateway) instead.

Hit the /health endpoint to confirm the container is up:

bash

curl http://localhost:8000/health
# {"status":"ok"}

Common mistake: Building the image and getting ModuleNotFoundError: No module named 'psycopg2' even though it’s in requirements.txt. This happens because psycopg2-binary and psycopg2 are different packages — make sure your requirements.txt lists psycopg2-binary, not just psycopg2, unless you have libpq-dev installed (which the Dockerfile above does).

Step 3 — Deploy to a Cloud VPS#

Which provider? I’ve deployed this exact stack on three of them, so here’s the short version:

Hetzner — what I use. The €4/month CX11 (1 vCPU, 2 GB RAM) runs this agent with headroom to spare. Best price in the business; EU data centres only (US locations cost slightly more).
DigitalOcean — ~$6/month for the equivalent droplet. Worth the extra dollar if you’re newer to servers: the dashboard, docs, and one-click Docker image are the friendliest of the three.
Vultr — comparable pricing to DigitalOcean with more global regions. Pick this if your users are far from EU/US data centres.

Any Ubuntu 22.04 box with 2 GB RAM works — the commands below are identical on all three.

Disclosure: some links on this page may be referral links — they cost you nothing and support more tutorials like this one.

The fastest deployment path for a single container is to save the image to a tarball, copy it to the server, and run it there. No container registry needed.

On your local machine:

bash

# Save the image to a file
docker save ai-agent:latest | gzip > ai-agent.tar.gz
 
# Copy to your server (replace YOUR_SERVER_IP)
scp ai-agent.tar.gz root@YOUR_SERVER_IP:/opt/ai-agent/
 
# Copy your production env file
scp .env.prod root@YOUR_SERVER_IP:/opt/ai-agent/.env

On the server (SSH in first: ssh root@YOUR_SERVER_IP):

bash

# Install Docker if it's not there yet
curl -fsSL https://get.docker.com | sh
 
# Load the image from the tarball
cd /opt/ai-agent
docker load < ai-agent.tar.gz
 
# Run the container in the background
docker run -d \
  --name ai-agent \
  --restart unless-stopped \
  -p 8000:8000 \
  --env-file .env \
  ai-agent:latest

--restart unless-stopped means the container comes back automatically if the server reboots — a small thing that saves you a panicked SSH session at 2 AM.

Now add Nginx as a reverse proxy so the agent is accessible on port 80 without needing :8000 in every URL:

bash

apt-get install -y nginx
 
cat > /etc/nginx/sites-available/ai-agent << 'EOF'
server {
    listen 80;
    server_name _;
 
    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 60s;
    }
}
EOF
 
ln -s /etc/nginx/sites-available/ai-agent /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx

proxy_read_timeout 60s is critical. The default Nginx timeout is 60 seconds, but I’ve bumped it here explicitly because LangGraph agents occasionally take 8–12 seconds on multi-tool calls. Without this, Nginx closes the connection and the client gets a 504 while the agent is still happily processing in the background.

Test from your local machine:

bash

curl -X POST http://YOUR_SERVER_IP/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is the Model Context Protocol?", "session_id": "prod-test-1"}'

If you get a reply, you’re live.

⚠️ Warning

Don’t bake API keys into the Docker image. Pass them as runtime environment variables or secrets — an image with embedded keys leaks them to anyone who can pull it.

Testing It and Common Errors#

Three things to verify before calling it done:

Memory persists across container restarts — stop and restart the container (docker restart ai-agent), then send a follow-up question that references a previous answer. The agent should remember it, because state is in Postgres rather than in-process.
Parallel sessions work — send two requests simultaneously with different session_id values. Both should get independent replies without bleeding context between them.
The /docs page is accessible — visit http://YOUR_SERVER_IP/docs in a browser. FastAPI auto-generates this. It’s a quick sanity check that the app is up and the schema is correct.

Errors I actually hit during this deployment:

connection refused on the Postgres URL inside Docker — I forgot that localhost inside a container refers to the container itself, not the host. Fix: use host.docker.internal (Mac/Windows) or the host’s bridge IP on Linux.
uvicorn.error: [Errno 13] Permission denied on port 80 — FastAPI running as a non-root user can’t bind to ports under 1024. Fix: run FastAPI on port 8000 and let Nginx handle 80. (The Dockerfile above already does this.)
SQLite import error after switching to Postgres — I had a stale from langgraph.checkpoint.sqlite import SqliteSaver import in agent.py. Remove all SQLite imports once you’ve migrated to Postgres or the module will fail to load.

What to Build Next#

The deployment you have now handles multiple users and survives restarts. Here’s where I’d invest next, in order of impact:

Add a streaming endpoint — right now the client waits 2–4 seconds for the full reply. FastAPI supports Server-Sent Events and LangGraph supports astream_events(). Wiring these together cuts perceived latency to near-zero by streaming tokens as they arrive — this is the single biggest UX improvement you can make with one afternoon of work.

Add human_in_the_loop approval — LangGraph’s interrupt_before parameter lets you pause the graph before a tool runs and wait for a human to approve. Expose this as a /approve endpoint. Now you have a supervised agent that can handle sensitive operations (sending emails, modifying databases) with a human in the loop.

Swap the VPS for a managed container service — the docker run approach works but requires you to SSH in to redeploy. Cloud Run (GCP), App Runner (AWS), or Fly.io let you push an image and get a URL back. I’ll cover this in Part 3.

Conclusion#

The agent from Part 1 is now a proper service — running in a container, persistent across restarts, and accessible from anywhere. The FastAPI + LangGraph + Postgres stack is what I run in production for the systems I’ve actually shipped, and the only significant difference between what you’ve just built and those systems is observability (logging, tracing) and a smoother CI/CD pipeline.

Building the HTTP layer and the Dockerfile took me about 2 hours the first time. With the patterns above, it should take you under one.

What are you going to use this agent for? If you’re building something where human approval before tool execution matters — customer-facing agents, anything that writes to a database — tell me in the comments and I’ll make that the focus of Part 3.

The full series — Agentic AI in Python: Zero to Production:

Part 1 — Tools, StateGraph & Memory
Part 2 — FastAPI, Docker & Deploy — you’re here
Part 3 — Multi-Agent Systems
Part 4 — AI Agent Memory
Part 5 — MCP Client & Real Tools
Part 6 — Observability & Evals

🧭 Where to go from here

Start at the beginning: Part 1 builds the local agent this post deploys.
Next in this series: Part 3 — multi-agent systems.
Prefer managed hosting? Deploy an AI agent to Cloud Run or Fly.io skips the VPS.

Frequently asked questions

Why use Postgres instead of SQLite in production? +

SQLite doesn't survive container restarts gracefully. A Postgres checkpointer is one env var away and keeps the agent's memory durable across redeploys.

Why do I get a psycopg2 ModuleNotFoundError even though it's in requirements.txt? +

psycopg2-binary and psycopg2 are different packages. List psycopg2-binary unless you've installed libpq-dev in the image.

How do I reach host Postgres from inside Docker? +

Use host.docker.internal on Mac/Windows, or the bridge IP 172.17.0.1 on Linux. Inside a container, localhost refers to the container itself.

Why is my client getting 504 errors? +

Nginx is closing the connection before the agent finishes. Raise proxy_read_timeout to cover multi-tool calls that take several seconds.

References

#AgenticAI #FastAPI #Docker #LangGraph #PythonTutorial #AIForDevelopers

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Intermediate · 1 minAgentic AI in Python: Zero to Production — The Full SeriesSukhveer Kaur · Jun 20, 2026 Intermediate · 6 minLangGraph vs CrewAI vs AutoGen: Which to Use in 2026?Sukhveer Kaur · Jun 15, 2026 Comparison · 6 minPydantic AI vs LangChain: Which Framework Should You Use? (2026)Sukhveer Kaur · Jul 6, 2026

More by Sukhveer Kaur

Guide · 8 minEvaluate an AI Agent on a Local LLM: Free, No API Key (2026)Sukhveer Kaur · Jul 18, 2026 Guide · 9 minAI Agent Guardrails in Python: Input & Output ValidationSukhveer Kaur · Jul 6, 2026 Comparison · 6 minAgentic Search vs RAG: Which One Do You Actually Need? (2026)Sukhveer Kaur · Jul 6, 2026

Continue the series

← Part 01

Build an Agentic AI App in Python: Zero to Production (Part 1)

Part 03 →

Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)

Get the next part the day it lands

One email per new part. No digest spam.