HomeOur TeamContact
HomeArtificial Intelligence
Deploy AI Agent to Cloud Run or Fly.io (Python 2026)

Deploy AI Agent to Cloud Run or Fly.io (Python 2026)

Artificial Intelligence
June 11, 2026
6 min read
Intermediate
📚 Part of the series: Agentic AI in Python: Zero to Production
Deploy AI agent to Cloud Run or Fly.io — upload-to-cloud thumbnail for a 2026 Python agent deployment tutorial
Table of Contents
01
What We're Building
02
Prerequisites and the One-Command Goal
03
Step 1 — Make the Container Listen on $PORT
04
Step 2 — Provision Managed Postgres and Secrets
05
Step 3 — Deploy AI Agent Code and Tune Two Settings
06
Testing It and the Errors You'll Actually Hit
07
What to Build Next
08
Conclusion

Series: Agentic AI in Python — Zero to Production This is the managed-deployment upgrade I promised in Part 2 — how to deploy AI agent code to a host that runs itself. — Part 2: Wrapped the agent in FastAPI, Dockerized it, and ran it on a VPS → /build-agentic-ai-app-python-part-2/ — Part 3: Turned one agent into a supervisor-and-workers team → /build-agentic-ai-app-python-part-3/

If you’re starting here, you need the Dockerized agent from Part 2 — a working Dockerfile and api.py.

In Part 2 you had a working agent. But every redeploy meant SSHing into a VPS, copying a tarball, and running docker load by hand. I did that for three weeks before I got tired of being the deploy pipeline. The fix is to deploy AI agent code to a managed host instead — by the end of this post you’ll push one command and get a live HTTPS URL back. No server to babysit, no SSH, and your agent’s memory still intact across deploys.

I’ll show you both hosts I actually use for this: Google Cloud Run and Fly.io. They take different paths to the same place. You hand them a container, they hand you a URL. Pick one — the agent code does not change. The only real work is three small adjustments most tutorials skip, and they’re exactly where the first deploy fails.

What We’re Building

We’re taking the exact container from Part 2 and moving it onto a managed container platform: a host that runs your Docker image for you and gives you a public URL, instead of a Linux box you administer yourself. The agent keeps calling Claude and its tools. Its conversation memory keeps living in Postgres — just a managed Postgres now, not one you install by hand.

Managed deployment architecture showing a developer pushing code or an image to Cloud Run or Fly.io, which runs the FastAPI and LangGraph agent container, calls the Claude LLM and Tavily tools, and persists memory in managed Postgres while returning a public HTTPS URL

The diagram shows what changes and what doesn’t. The agent, FastAPI, and Postgres are identical to Part 2 — only the host underneath swaps from a self-managed VPS to a managed platform. You’re changing your deployment workflow, not rewriting your app.

Prerequisites and the One-Command Goal

Before anything else, make sure the Part 2 container builds cleanly and you have the right CLI installed. Here’s the checklist:

  • The Part 2 project — a working Dockerfile, api.py, and requirements.txt. Confirm docker build -t ai-agent . succeeds locally first. If it won’t build on your laptop, it won’t build in the cloud.
  • An API key for your modelANTHROPIC_API_KEY, or your provider’s key. You’ll set this as a platform secret, never bake it into the image.
  • One CLI, your choice — the gcloud CLI for Cloud Run (install guide) or flyctl for Fly.io (install guide).

Five-step flowchart for deploying the agent to a managed host: bind to the PORT environment variable, provision managed Postgres, set secrets, push to deploy with one command, then test the live URL

That flowchart is the whole post. To show you where we’re headed, here is the single command that replaces the entire Part 2 deploy ritual — on Cloud Run:

bash
gcloud run deploy ai-agent --source . --region asia-south1 --allow-unauthenticated

Or the Fly.io equivalent — run once to scaffold, then to ship every change:

bash
fly launch # generates fly.toml, once
fly deploy # ships every change after that

One line, and the platform builds, uploads, and serves your container. The catch is what you do before that line. That’s steps 1 to 3.

Step 1 — Make the Container Listen on $PORT

Here’s the change that breaks more first deploys than anything else. Managed platforms tell your app which port to use through a PORT environment variable, and your container must listen on it. Part 2 hardcoded port 8000. Cloud Run injects PORT=8080. Hardcode the wrong one and the platform reports “the container failed to start and listen on the port” — the most-Googled Cloud Run error there is.

Change the last line of your Part 2 Dockerfile so uvicorn reads the variable:

dockerfile
# Dockerfile — replace the hardcoded CMD from Part 2 with this
CMD ["sh", "-c", "uvicorn api:app --host 0.0.0.0 --port ${PORT:-8080}"]

The ${PORT:-8080} syntax means “use PORT if the platform set it, otherwise default to 8080.” I wrap the command in sh -c because the JSON-array form of CMD won’t expand environment variables on its own. That detail cost me a confusing twenty minutes the first time. The container ran fine on my laptop and failed instantly in the cloud.

Common mistake: Leaving --port 8000 in the Dockerfile and getting Default STARTUP TCP probe failed on Cloud Run. The fix is the ${PORT:-8080} line above. Always bind to the variable, never a hardcoded number.

Step 2 — Provision Managed Postgres and Secrets

Your agent’s memory cannot live inside the container. Managed platforms throw the container away and rebuild it on every deploy and every scale event. A local SQLite file or a container-local Postgres would vanish. You need a database that outlives the container, plus a safe place for your API keys.

On Cloud Run, the fastest managed Postgres options are Cloud SQL, or a serverless provider like Neon or Supabase whose free tier is plenty for an agent. Grab the connection string. Then pass your secrets in at deploy time instead of in the image:

bash
gcloud run deploy ai-agent --source . --region asia-south1 \
--allow-unauthenticated \
--set-env-vars "DATABASE_URL=postgresql://USER:PASS@HOST:5432/db?sslmode=require" \
--set-env-vars "ANTHROPIC_API_KEY=sk-ant-..."

On Fly.io, the database is one command and an attach. The attach writes the DATABASE_URL secret for you automatically:

bash
fly postgres create --name ai-agent-db
fly postgres attach ai-agent-db # sets DATABASE_URL as a secret
fly secrets set ANTHROPIC_API_KEY=sk-ant-...

Note the ?sslmode=require on the Cloud Run database URL. Most managed Postgres providers reject unencrypted connections, and psycopg2 won’t add SSL on its own. Leave it off and you’ll see connection refused or an SSL error on the first request. For real projects, graduate from --set-env-vars to Google Secret Manager (--set-secrets) so keys never appear in your deploy logs.

Step 3 — Deploy AI Agent Code and Tune Two Settings

With the port fixed and secrets ready, the deploy itself is the easy part. But two default settings will bite an LLM-backed agent specifically, so set them now. Whether you deploy AI agent code to Cloud Run or Fly.io, the same two ideas apply.

On Cloud Run, deploy from source and raise the request timeout:

bash
gcloud run deploy ai-agent --source . --region asia-south1 \
--allow-unauthenticated \
--timeout 300 \
--min-instances 1 \
--max-instances 5

--timeout 300 gives the agent five minutes per request. A multi-tool agent run can take 15–20 seconds, and I’ve seen complex chains push past a minute. --min-instances 1 keeps one instance warm, so users don’t eat a cold start — the multi-second delay while a scaled-to-zero platform boots a fresh container. That warm instance costs a few dollars a month. For a portfolio demo I’d skip it; for anything user-facing I’d keep it.

--max-instances 5 is the one people forget. Cloud Run scales out under load, and every instance opens its own pool of Postgres connections. Scale to 50 instances against a free-tier database capped near 20 connections and the whole thing falls over. Cap max-instances, or put a connection pooler (PgBouncer, or Neon’s pooled URL) in front of the database, to stay under the limit.

On Fly.io, the same two ideas live in fly.toml:

toml
# fly.toml
[http_service]
internal_port = 8080
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1 # keep one warm, same idea as Cloud Run min-instances

Then run fly deploy. Fly runs each instance as a Firecracker microVM — a lightweight virtual machine that boots in well under a second. Its cold starts are gentler than Cloud Run’s, but min_machines_running = 1 still pays off for a chat agent.

Testing It and the Errors You’ll Actually Hit

Once the platform prints your URL, verify three things before you call it done:

  1. Health checkcurl https://YOUR-URL/health should return {"status":"ok"}. If this fails, the container never started. Check the port binding from Step 1.
  2. A real chat turncurl -X POST https://YOUR-URL/chat -H "Content-Type: application/json" -d '{"message":"What is Cloud Run?","session_id":"prod-1"}'. A JSON reply means the model key and the database both connected.
  3. Memory survives a redeploy — send a message, redeploy, then ask a follow-up that references the first. The agent should remember, because state lives in managed Postgres, not the container.

These are the errors I hit moving this exact agent across both platforms:

  • Container failed to start and listen on the port — the Step 1 mistake. The Dockerfile was still on a hardcoded port. Bind to ${PORT:-8080}.
  • psycopg2.OperationalError: connection refused — a missing ?sslmode=require on the database URL, or the database’s IP allowlist didn’t include the platform. Add the SSL param; for Cloud SQL, use the Cloud SQL connector instead of a raw IP.
  • 504 after about twenty seconds — the request timeout was too low for a long tool chain. Raise --timeout on Cloud Run, or check your client isn’t giving up early.
  • remaining connection slots are reserved — too many instances against a small database. Cap --max-instances or add a pooler.

What to Build Next

The deploy you have now ships in one command and survives restarts. Here’s where I’d invest next, in order:

Add CI/CD. Right now you run the deploy from your laptop. Wire a GitHub Action that runs the same command on every push to main, and you get hands-off deploys when you merge.

Add a custom domain. Both platforms map a domain in a couple of commands and provision the TLS certificate for you. agent.yoursite.com reads better than a generated URL and costs nothing extra.

Lock down access. Drop --allow-unauthenticated and put the agent behind an API key or platform IAM, so only your front end can call it. An open agent endpoint is an open invoice against your model bill.

Conclusion

You now have a clean way to deploy AI agent code to a managed host. One command ships it, the platform handles the container and the URL, and Postgres keeps its memory alive across every deploy. The three things that matter aren’t platform-specific: bind to $PORT, keep state in managed Postgres, and tune timeout and instance limits for an LLM workload. Get those right and Cloud Run or Fly.io is genuinely a one-line deploy.

A deployed agent is only as useful as what it can do. The natural next step is giving it real, governed tools to call — which is exactly what a production MCP server is for. Next, plug this agent into the authenticated MCP server we built here so it can act on external systems safely.

Which host did you pick — Cloud Run or Fly.io — and what was the first error that stopped your deploy? Tell me in the comments; the most common one becomes its own troubleshooting post.

Catch up on the series: Part 2 — FastAPI, Docker & Deploy · Part 3 — Multi-Agent Systems

Related: What Are AI Agents? Complete Guide for Developers (2026)


Tags

#PythonTutorial#LangGraph#AIForDevelopers

Share

Previous Article
Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)
More from this author

Sukhveer Kaur

Build an Agentic AI App in Python: AI Agent Memory (Part 4)
Build an Agentic AI App in Python: AI Agent Memory (Part 4)
June 12, 2026
6 min
Intermediate
See all by Sukhveer Kaur

Subscribe to our newsletter!

We'll send you the best of our blog just once a month. We promise.
Deploy AI Agent to Cloud Run or Fly.io (Python 2026)
6 min left

Sukhveer Kaur

Software Developer & AI Engineer

Popular Posts

01
Deploy AI Agent to Cloud Run or Fly.io (Python 2026)
Artificial Intelligence
·
6 min read

Table Of Contents

1
What We're Building
2
Prerequisites and the One-Command Goal
3
Step 1 — Make the Container Listen on $PORT
4
Step 2 — Provision Managed Postgres and Secrets
5
Step 3 — Deploy AI Agent Code and Tune Two Settings
6
Testing It and the Errors You'll Actually Hit
7
What to Build Next
8
Conclusion

Related Posts

© 2026, All Rights Reserved.

Quick Links

Advertise with usOur TeamContact Us

Social Media