Intermediate

Deploy AI Agent to Cloud Run or Fly.io (Python 2026)

Deploy AI agent code to Cloud Run or Fly.io in one command: push once, get a live HTTPS URL, and keep your Postgres-backed agent memory intact — no VPS.

SK

Sukhveer Kaur

Published June 11, 2026 · Updated July 6, 2026

8 min read

Open in ChatGPT Open in Claude

On this page +

What We're Building Prerequisites and the One-Command Goal Step 1 — Make the Container Listen on $PORT Step 2 — Provision Managed Postgres and Secrets Step 3 — Deploy AI Agent Code and Tune Two Settings Testing It and the Errors You'll Actually Hit What to Build Next Conclusion

Series: Agentic AI in Python — Zero to Production This is the managed-deployment upgrade I promised in Part 2 — how to deploy AI agent code to a host that runs itself. — Part 2: Wrapped the agent in FastAPI, Dockerized it, and ran it on a VPS → /build-agentic-ai-app-python-part-2/ — Part 3: Turned one agent into a supervisor-and-workers team → /build-agentic-ai-app-python-part-3/

If you’re starting here, you need the Dockerized agent from Part 2 — a working Dockerfile and api.py.

In Part 2 you had a working agent, but every redeploy meant SSHing into the VPS, copying a tarball over, and running docker load by hand. I kept that routine up for about three weeks. Then I got tired of being my own deploy pipeline. So this post is the fix: deploy AI agent code to a managed host, push one command, and get a live HTTPS URL back. No server to babysit. No SSH. And the agent’s memory survives every deploy.

I use two hosts for this, Google Cloud Run and Fly.io, and I’ll cover both. They reach the same place by different roads: you hand them a container, they hand you a URL. Pick whichever you like, because the agent code itself doesn’t change at all. What does need changing is three small things most tutorials skip, and those three things are where nearly every first deploy dies.

🎯 Key takeaways

Managed hosts (Cloud Run, Fly.io) take a container and hand back an HTTPS URL — no SSH, no tarballs, and the agent code itself doesn’t change.
Make the container listen on $PORT (the platform injects it). Hardcoding 8000 is the single most common first-deploy failure.
Use managed Postgres + secrets so the agent’s memory survives every deploy, and never bake API keys into the image.
The three things tutorials skip — $PORT, managed Postgres/secrets, and two platform settings (concurrency and request timeout) — are where most first deploys die.

What We’re Building#

The plan is simple. Take the exact container from Part 2 and move it onto a managed container platform, meaning a host that runs your Docker image for you and gives you a public URL, with no Linux box left for you to administer. The agent still calls Claude and its tools the way it always did, and its conversation memory still lives in Postgres. The only difference is that the Postgres is managed now instead of something you installed by hand.

Look at what stays the same in that diagram. The agent, FastAPI, and Postgres are all identical to Part 2. The only piece that swaps out is the host underneath, from a VPS you manage to a platform that manages itself. You’re changing how you deploy, not what you deploy.

Prerequisites and the One-Command Goal#

Before touching the cloud, check two boring things: the Part 2 container builds cleanly, and you have the right CLI installed.

The Part 2 project itself — a working Dockerfile, api.py, and requirements.txt. Run docker build -t ai-agent . locally and make sure it succeeds. If it won’t build on your laptop, it won’t build in the cloud either.
An API key for your model. For me that’s ANTHROPIC_API_KEY. This goes in later as a platform secret; it never gets baked into the image.
One CLI, your choice: gcloud for Cloud Run (install guide) or flyctl for Fly.io (install guide).

That flowchart is basically the whole post in one image. And so you can see where we’re headed, this single command is what replaces the entire Part 2 deploy ritual on Cloud Run:

bash

gcloud run deploy ai-agent --source . --region asia-south1 --allow-unauthenticated

Or the Fly.io equivalent — run once to scaffold, then to ship every change:

bash

fly launch    # generates fly.toml, once
fly deploy    # ships every change after that

One line. The platform builds the image, uploads it, and serves it. The catch is everything you have to get right before that line, which is what steps 1 to 3 are about.

Step 1 — Make the Container Listen on $PORT#

This one change breaks more first deploys than everything else combined. Managed platforms decide which port your app should use, and they tell you through a PORT environment variable; your container has to listen on whatever they picked. Part 2 hardcoded port 8000. Cloud Run injects PORT=8080. Get that wrong and you’re staring at “the container failed to start and listen on the port”, which might be the most-Googled Cloud Run error in existence.

Change the last line of your Part 2 Dockerfile so uvicorn reads the variable:

dockerfile

# Dockerfile — replace the hardcoded CMD from Part 2 with this
CMD ["sh", "-c", "uvicorn api:app --host 0.0.0.0 --port ${PORT:-8080}"]

${PORT:-8080} reads as “use PORT if the platform set it, otherwise fall back to 8080.” The sh -c wrapper matters too. The JSON-array form of CMD won’t expand environment variables on its own, a detail that cost me a confusing twenty minutes once — the container ran fine on my laptop, died instantly in the cloud, and the logs didn’t say why.

Common mistake: leaving --port 8000 in the Dockerfile and then hitting Default STARTUP TCP probe failed on Cloud Run. The fix is that ${PORT:-8080} line above. Bind to the variable, not a number you chose.

Step 2 — Provision Managed Postgres and Secrets#

Your agent’s memory can’t live inside the container, full stop. Managed platforms throw containers away constantly — every deploy, every scale event — and anything stored locally, whether a SQLite file or a Postgres running in the same container, vanishes with them. So you need two things before deploying: a database that outlives the container, and a safe place for API keys.

For Cloud Run, the quickest options are Cloud SQL or a serverless provider like Neon or Supabase, whose free tiers are plenty for an agent. Grab the connection string, then pass your secrets at deploy time rather than baking them into the image:

bash

gcloud run deploy ai-agent --source . --region asia-south1 \
  --allow-unauthenticated \
  --set-env-vars "DATABASE_URL=postgresql://USER:PASS@HOST:5432/db?sslmode=require" \
  --set-env-vars "ANTHROPIC_API_KEY=sk-ant-..."

Fly.io makes this part almost unfair. One command creates the database, and the attach writes DATABASE_URL into your secrets for you:

bash

fly postgres create --name ai-agent-db
fly postgres attach ai-agent-db        # sets DATABASE_URL as a secret
fly secrets set ANTHROPIC_API_KEY=sk-ant-...

Don’t skip the ?sslmode=require on that Cloud Run database URL. Most managed Postgres providers refuse unencrypted connections, and psycopg2 won’t add SSL by itself, so leaving it off gets you connection refused or an SSL error on the very first request. One more thing for real projects: at some point, graduate from --set-env-vars to Google Secret Manager (--set-secrets) so keys stop showing up in your deploy logs.

Step 3 — Deploy AI Agent Code and Tune Two Settings#

The deploy itself is now the easy part. What’s left is two default settings that bite LLM-backed agents specifically, and I’d rather you set them now than discover them in production like I did. The same two ideas apply whether you deploy AI agent code to Cloud Run or Fly.io.

On Cloud Run, deploy from source and raise the request timeout:

bash

gcloud run deploy ai-agent --source . --region asia-south1 \
  --allow-unauthenticated \
  --timeout 300 \
  --min-instances 1 \
  --max-instances 5

--timeout 300 gives each request five minutes. That sounds generous until you watch a multi-tool agent run take 15 or 20 seconds on a normal day, and over a minute when the chain gets complicated. --min-instances 1 keeps one instance warm so users don’t eat a cold start, the multi-second pause while a scaled-to-zero platform boots a fresh container. The warm instance costs a few dollars a month. Skip it for a portfolio demo; keep it for anything real users touch.

--max-instances 5 is the one everybody forgets, including me. Cloud Run scales out under load, and every new instance opens its own pool of Postgres connections. Now picture 50 instances hammering a free-tier database that caps out around 20 connections. The whole thing falls over. Either cap max-instances or put a connection pooler in front of the database — PgBouncer works, and Neon ships a pooled URL that handles it for you.

On Fly.io, the same two ideas live in fly.toml:

toml

# fly.toml
[http_service]
  internal_port = 8080
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 1   # keep one warm, same idea as Cloud Run min-instances

Then fly deploy and you’re done. Fly runs each instance as a Firecracker microVM, a lightweight virtual machine that boots in well under a second, so its cold starts are gentler than Cloud Run’s. For a chat agent I still keep min_machines_running = 1 though. Even a one-second pause feels broken when someone is mid-conversation.

⚠️ The #1 deploy failure

If your container doesn’t listen on the platform’s $PORT, the deploy fails health checks with no obvious error. Bind to 0.0.0.0:$PORT, not a hardcoded port — this is the single most common Cloud Run / Fly.io mistake.

Testing It and the Errors You’ll Actually Hit#

Once the platform prints your URL, don’t call it done yet. Check three things:

Health check first: curl https://YOUR-URL/health should come back with {"status":"ok"}. If it doesn’t, the container never started — go back to the port binding in Step 1.
Then a real chat turn: curl -X POST https://YOUR-URL/chat -H "Content-Type: application/json" -d '{"message":"What is Cloud Run?","session_id":"prod-1"}'. A JSON reply means the model key and the database both connected.
Finally, prove memory survives a redeploy. Send a message, redeploy, then ask a follow-up that references the first one. The agent should remember, since state lives in managed Postgres rather than in the container.

And for the record, these are the errors I personally hit while moving this exact agent across both platforms:

Container failed to start and listen on the port — the Step 1 mistake. My Dockerfile was still on a hardcoded port. Bind to ${PORT:-8080}.
psycopg2.OperationalError: connection refused — a missing ?sslmode=require on the database URL, or the database’s IP allowlist didn’t include the platform. Add the SSL param; on Cloud SQL, use the Cloud SQL connector instead of a raw IP.
A 504 after about twenty seconds — the request timeout was too low for a long tool chain. Raise --timeout on Cloud Run, and check your client isn’t giving up early.
remaining connection slots are reserved — too many instances against a small database. Cap --max-instances or add a pooler.

What to Build Next#

What you have now ships in one command and survives restarts, which already puts it ahead of most side projects. Where I’d spend time next, roughly in order:

First, CI/CD. Right now the deploy runs from your laptop, which means deploys happen when you remember to run them. A small GitHub Action that fires the same command on every push to main gets you hands-off deploys on merge.

Second, a custom domain. Both platforms map one in a couple of commands and provision the TLS certificate for you. agent.yoursite.com reads a lot better than a generated URL, and it costs nothing extra.

Third — and don’t put this one off — lock down access. Drop --allow-unauthenticated and put the agent behind an API key or platform IAM, so only your front end can call it. An open agent endpoint is an open invoice against your model bill.

Conclusion#

So that’s the whole move. You can now deploy AI agent code to a managed host with one command, and Postgres keeps the agent’s memory alive through every deploy. If you remember nothing else from this post, remember the three adjustments, because they aren’t platform-specific. Bind to $PORT. Keep state in a managed Postgres. Tune the timeout and instance limits for an LLM workload. Get those right and the one-line deploy really is one line, on either host.

A deployed agent is only as useful as what it can actually do, though. The next step is giving it real, governed tools to call, and that’s what a production MCP server is for. Plug this agent into the authenticated MCP server we built here and it can act on external systems safely.

Which host did you pick, and what was the first error that stopped your deploy? Tell me in the comments. The most common one becomes its own troubleshooting post.

Catch up on the series: Part 2 — FastAPI, Docker & Deploy · Part 3 — Multi-Agent Systems

Frequently asked questions

Cloud Run or Fly.io — which should I pick? +

Either; they reach the same place by different roads. The agent container is identical, so choose on pricing, region, and which dashboard you prefer.

Why does my deploy crash on startup? +

Almost always because the container doesn't listen on the platform's injected $PORT. Read $PORT from the environment instead of hardcoding a port.

How does the agent keep its memory after a redeploy? +

Point the checkpointer at managed Postgres rather than in-container storage, and inject the connection string as a secret so it persists across deploys.

Do I need to change my agent code to deploy? +

No. Only the container's port binding, the Postgres URL, and two platform settings (concurrency and request timeout) need attention.

References

#DeployAIAgent #CloudRun #FlyIO #PythonTutorial #LangGraph #AIForDevelopers

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Comparison · 6 minPydantic AI vs LangChain: Which Framework Should You Use? (2026)Sukhveer Kaur · Jul 6, 2026 Intermediate · 1 minAgentic AI in Python: Zero to Production — The Full SeriesSukhveer Kaur · Jun 20, 2026 Intermediate · 7 minBuild an Agentic AI App in Python: MCP Client (Part 5)Sukhveer Kaur · Jun 17, 2026

More by Sukhveer Kaur

Guide · 8 minEvaluate an AI Agent on a Local LLM: Free, No API Key (2026)Sukhveer Kaur · Jul 18, 2026 Guide · 9 minAI Agent Guardrails in Python: Input & Output ValidationSukhveer Kaur · Jul 6, 2026 Comparison · 6 minAgentic Search vs RAG: Which One Do You Actually Need? (2026)Sukhveer Kaur · Jul 6, 2026

Keep reading

← Previous

Build an MCP Server in Python: Production-Ready in 2026

Next →

Build an Agentic AI App in Python: AI Agent Memory (Part 4)

New AI engineering guides, the day they ship

Real Python, production depth. No digest spam.