Series: Agentic AI in Python — Zero to Production This is the managed-deployment upgrade I promised in Part 2 — how to deploy AI agent code to a host that runs itself. — Part 2: Wrapped the agent in FastAPI, Dockerized it, and ran it on a VPS → /build-agentic-ai-app-python-part-2/ — Part 3: Turned one agent into a supervisor-and-workers team → /build-agentic-ai-app-python-part-3/
If you’re starting here, you need the Dockerized agent from Part 2 — a working
Dockerfileandapi.py.
In Part 2 you had a working agent. But every redeploy meant SSHing into a VPS, copying a tarball, and running docker load by hand. I did that for three weeks before I got tired of being the deploy pipeline. The fix is to deploy AI agent code to a managed host instead — by the end of this post you’ll push one command and get a live HTTPS URL back. No server to babysit, no SSH, and your agent’s memory still intact across deploys.
I’ll show you both hosts I actually use for this: Google Cloud Run and Fly.io. They take different paths to the same place. You hand them a container, they hand you a URL. Pick one — the agent code does not change. The only real work is three small adjustments most tutorials skip, and they’re exactly where the first deploy fails.
What We’re Building
We’re taking the exact container from Part 2 and moving it onto a managed container platform: a host that runs your Docker image for you and gives you a public URL, instead of a Linux box you administer yourself. The agent keeps calling Claude and its tools. Its conversation memory keeps living in Postgres — just a managed Postgres now, not one you install by hand.
The diagram shows what changes and what doesn’t. The agent, FastAPI, and Postgres are identical to Part 2 — only the host underneath swaps from a self-managed VPS to a managed platform. You’re changing your deployment workflow, not rewriting your app.
Prerequisites and the One-Command Goal
Before anything else, make sure the Part 2 container builds cleanly and you have the right CLI installed. Here’s the checklist:
- The Part 2 project — a working
Dockerfile,api.py, andrequirements.txt. Confirmdocker build -t ai-agent .succeeds locally first. If it won’t build on your laptop, it won’t build in the cloud. - An API key for your model —
ANTHROPIC_API_KEY, or your provider’s key. You’ll set this as a platform secret, never bake it into the image. - One CLI, your choice — the
gcloudCLI for Cloud Run (install guide) orflyctlfor Fly.io (install guide).
That flowchart is the whole post. To show you where we’re headed, here is the single command that replaces the entire Part 2 deploy ritual — on Cloud Run:
gcloud run deploy ai-agent --source . --region asia-south1 --allow-unauthenticated
Or the Fly.io equivalent — run once to scaffold, then to ship every change:
fly launch # generates fly.toml, oncefly deploy # ships every change after that
One line, and the platform builds, uploads, and serves your container. The catch is what you do before that line. That’s steps 1 to 3.
Step 1 — Make the Container Listen on $PORT
Here’s the change that breaks more first deploys than anything else. Managed platforms tell your app which port to use through a PORT environment variable, and your container must listen on it. Part 2 hardcoded port 8000. Cloud Run injects PORT=8080. Hardcode the wrong one and the platform reports “the container failed to start and listen on the port” — the most-Googled Cloud Run error there is.
Change the last line of your Part 2 Dockerfile so uvicorn reads the variable:
# Dockerfile — replace the hardcoded CMD from Part 2 with thisCMD ["sh", "-c", "uvicorn api:app --host 0.0.0.0 --port ${PORT:-8080}"]
The ${PORT:-8080} syntax means “use PORT if the platform set it, otherwise default to 8080.” I wrap the command in sh -c because the JSON-array form of CMD won’t expand environment variables on its own. That detail cost me a confusing twenty minutes the first time. The container ran fine on my laptop and failed instantly in the cloud.
Common mistake: Leaving
--port 8000in the Dockerfile and gettingDefault STARTUP TCP probe failedon Cloud Run. The fix is the${PORT:-8080}line above. Always bind to the variable, never a hardcoded number.
Step 2 — Provision Managed Postgres and Secrets
Your agent’s memory cannot live inside the container. Managed platforms throw the container away and rebuild it on every deploy and every scale event. A local SQLite file or a container-local Postgres would vanish. You need a database that outlives the container, plus a safe place for your API keys.
On Cloud Run, the fastest managed Postgres options are Cloud SQL, or a serverless provider like Neon or Supabase whose free tier is plenty for an agent. Grab the connection string. Then pass your secrets in at deploy time instead of in the image:
gcloud run deploy ai-agent --source . --region asia-south1 \--allow-unauthenticated \--set-env-vars "DATABASE_URL=postgresql://USER:PASS@HOST:5432/db?sslmode=require" \--set-env-vars "ANTHROPIC_API_KEY=sk-ant-..."
On Fly.io, the database is one command and an attach. The attach writes the DATABASE_URL secret for you automatically:
fly postgres create --name ai-agent-dbfly postgres attach ai-agent-db # sets DATABASE_URL as a secretfly secrets set ANTHROPIC_API_KEY=sk-ant-...
Note the ?sslmode=require on the Cloud Run database URL. Most managed Postgres providers reject unencrypted connections, and psycopg2 won’t add SSL on its own. Leave it off and you’ll see connection refused or an SSL error on the first request. For real projects, graduate from --set-env-vars to Google Secret Manager (--set-secrets) so keys never appear in your deploy logs.
Step 3 — Deploy AI Agent Code and Tune Two Settings
With the port fixed and secrets ready, the deploy itself is the easy part. But two default settings will bite an LLM-backed agent specifically, so set them now. Whether you deploy AI agent code to Cloud Run or Fly.io, the same two ideas apply.
On Cloud Run, deploy from source and raise the request timeout:
gcloud run deploy ai-agent --source . --region asia-south1 \--allow-unauthenticated \--timeout 300 \--min-instances 1 \--max-instances 5
--timeout 300 gives the agent five minutes per request. A multi-tool agent run can take 15–20 seconds, and I’ve seen complex chains push past a minute. --min-instances 1 keeps one instance warm, so users don’t eat a cold start — the multi-second delay while a scaled-to-zero platform boots a fresh container. That warm instance costs a few dollars a month. For a portfolio demo I’d skip it; for anything user-facing I’d keep it.
--max-instances 5 is the one people forget. Cloud Run scales out under load, and every instance opens its own pool of Postgres connections. Scale to 50 instances against a free-tier database capped near 20 connections and the whole thing falls over. Cap max-instances, or put a connection pooler (PgBouncer, or Neon’s pooled URL) in front of the database, to stay under the limit.
On Fly.io, the same two ideas live in fly.toml:
# fly.toml[http_service]internal_port = 8080auto_stop_machines = trueauto_start_machines = truemin_machines_running = 1 # keep one warm, same idea as Cloud Run min-instances
Then run fly deploy. Fly runs each instance as a Firecracker microVM — a lightweight virtual machine that boots in well under a second. Its cold starts are gentler than Cloud Run’s, but min_machines_running = 1 still pays off for a chat agent.
Testing It and the Errors You’ll Actually Hit
Once the platform prints your URL, verify three things before you call it done:
- Health check —
curl https://YOUR-URL/healthshould return{"status":"ok"}. If this fails, the container never started. Check the port binding from Step 1. - A real chat turn —
curl -X POST https://YOUR-URL/chat -H "Content-Type: application/json" -d '{"message":"What is Cloud Run?","session_id":"prod-1"}'. A JSON reply means the model key and the database both connected. - Memory survives a redeploy — send a message, redeploy, then ask a follow-up that references the first. The agent should remember, because state lives in managed Postgres, not the container.
These are the errors I hit moving this exact agent across both platforms:
Container failed to start and listen on the port— the Step 1 mistake. The Dockerfile was still on a hardcoded port. Bind to${PORT:-8080}.psycopg2.OperationalError: connection refused— a missing?sslmode=requireon the database URL, or the database’s IP allowlist didn’t include the platform. Add the SSL param; for Cloud SQL, use the Cloud SQL connector instead of a raw IP.- 504 after about twenty seconds — the request timeout was too low for a long tool chain. Raise
--timeouton Cloud Run, or check your client isn’t giving up early. remaining connection slots are reserved— too many instances against a small database. Cap--max-instancesor add a pooler.
What to Build Next
The deploy you have now ships in one command and survives restarts. Here’s where I’d invest next, in order:
Add CI/CD. Right now you run the deploy from your laptop. Wire a GitHub Action that runs the same command on every push to main, and you get hands-off deploys when you merge.
Add a custom domain. Both platforms map a domain in a couple of commands and provision the TLS certificate for you. agent.yoursite.com reads better than a generated URL and costs nothing extra.
Lock down access. Drop --allow-unauthenticated and put the agent behind an API key or platform IAM, so only your front end can call it. An open agent endpoint is an open invoice against your model bill.
Conclusion
You now have a clean way to deploy AI agent code to a managed host. One command ships it, the platform handles the container and the URL, and Postgres keeps its memory alive across every deploy. The three things that matter aren’t platform-specific: bind to $PORT, keep state in managed Postgres, and tune timeout and instance limits for an LLM workload. Get those right and Cloud Run or Fly.io is genuinely a one-line deploy.
A deployed agent is only as useful as what it can do. The natural next step is giving it real, governed tools to call — which is exactly what a production MCP server is for. Next, plug this agent into the authenticated MCP server we built here so it can act on external systems safely.
Which host did you pick — Cloud Run or Fly.io — and what was the first error that stopped your deploy? Tell me in the comments; the most common one becomes its own troubleshooting post.
Catch up on the series: Part 2 — FastAPI, Docker & Deploy · Part 3 — Multi-Agent Systems
Related: What Are AI Agents? Complete Guide for Developers (2026)






