Agentic AI in Python: Zero to Production · 03Intermediate

Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)

Build multi-agent systems in Python with LangGraph: a supervisor routing search, summarise & fact-check workers, plus loop guards that save real money.

SK

Sukhveer Kaur

Published June 11, 2026 · Updated July 6, 2026

8 min read

Open in ChatGPT Open in Claude

On this page +

Why One Agent Isn't Enough Prerequisites and the Build Plan Step 1 — Define the Shared State and the Workers Step 2 — Write the Supervisor Step 3 — Wire the Graph (and Add the Loop Guards)How CrewAI and LangGraph's Prebuilt Supervisor Compare When Multi-Agent Systems Go Wrong What to Build Next Conclusion

🧰 New here? Set up your environment first · ~5 min

Install Python 3.11+ — confirm with python3 --version.
Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
Install the packages this tutorial lists: pip install -U pip <packages>.
Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

🟡 Intermediate⏱️ 30 minStack: Python 3.11+, LangGraph

Series: Agentic AI in Python — Zero to Production This is Part 3 — multi-agent systems. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tool use and SQLite memory → /build-agentic-ai-app-python-part-1/ — Part 2: Wrapped it in FastAPI, Dockerised it, and deployed it to a cloud VPS → /build-agentic-ai-app-python-part-2/

If you’re starting here, you need the working agent.py from Part 1 — the deployment layer from Part 2 is optional for this post.

One agent is predictable. Two agents talking to each other is a negotiation. Five agents is a meeting that never ends. This post is about building multi-agent systems that actually finish tasks — and stop when they’re done.

At the end of Part 2, I said the next post would cover managed deployments. That’s still coming, but the question readers actually asked after Part 2 was different: “I have one working agent — how do I make several of them collaborate without losing control?” So Part 3 answers that.

By the end, you’ll have a supervisor agent coordinating three specialist workers — search, summarise, fact-check — built from scratch in plain LangGraph. You’ll also add the guardrails that keep an agent team from burning your API budget in a loop. Let’s start with why you’d want more than one agent at all.

✅ Before you start

A working single agent from Part 1 (the Part 2 deployment is optional here)
You understand the reason → act → observe loop — if not, build one from scratch first
An LLM API key

🎯 Key takeaways

A specialist with one job and a three-line prompt beats a generalist with a page of instructions — multi-agent systems make that split explicit.
Use the supervisor pattern: one orchestrator routes to workers, and workers communicate only through shared LangGraph state, never directly.
Force the supervisor to return structured output (a fixed schema) so routing is reliable, and always pass it the full message history.
Add two independent stop conditions — your own worker-call counter and LangGraph’s recursion_limit — because the pattern costs ~3x and runaway loops are the most expensive bug.

Why One Agent Isn’t Enough#

The single agent from Part 1 hits a ceiling, and you’ve probably already felt it. Pack search, summarisation, and verification instructions into one system prompt and the prompt grows until the agent starts ignoring parts of it. A specialist agent with one job and a three-line prompt consistently beats a generalist with a page of instructions — that’s been true in every system I’ve shipped. Multi-agent systems make that split explicit: each specialist gets its own prompt, and something else handles coordination.

That something is the supervisor pattern (one orchestrator agent that decides which specialist acts next, reads the result, and repeats until the task is done). Here’s the whole architecture we’re building:

The diagram shows the loop that makes this work. Workers never talk to each other directly — they write results into shared state. The supervisor alone decides who goes next or whether to finish.

Prerequisites and the Build Plan#

Everything builds on Part 1’s stack, with one version note: this tutorial uses LangGraph 1.2 (1.2.4 was current when I wrote this in June 2026). Quick checklist before you start:

Python 3.11+ and the virtual environment from Part 1
pip install -U langgraph langchain-anthropic tavily-python — upgrading matters; the 1.x API is what we use below
API keys in .env — ANTHROPIC_API_KEY and TAVILY_API_KEY, same as Part 1
A spending limit on your LLM account — seriously. Runaway loops are the most expensive bug class in multi-agent systems, and we’ll guard against them in code too.

If you don’t have the Part 1 code, build it first — it’s a 30-minute read and this post reuses its patterns.

That flowchart is the exact sequence of the next three steps. Steps 1 and 2 are quick; the supervisor in step 3 is where the interesting decisions live.

Step 1 — Define the Shared State and the Workers#

Multi-agent systems live or die on one question: what do the agents share? In LangGraph, the answer is explicit — a typed state dict that every node reads and writes. We extend Part 1’s state with two fields the supervisor needs:

python

# team_state.py
import operator
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage
 
class TeamState(TypedDict):
    messages: Annotated[list[BaseMessage], operator.add]
    next_worker: str      # the supervisor's routing decision
    worker_calls: int     # how many turns we've used — our loop guard

worker_calls looks boring. It’s the most important field in this file — it’s the hard limit that stops a runaway team, and we’ll enforce it in step 3.

Now the three workers. Each one is just a function with a narrow job and a narrow prompt:

python

# workers.py
import os
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
from tavily import TavilyClient
 
load_dotenv()
llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0)
tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
 
def search_worker(state):
    query = state["messages"][0].content   # the original user request
    results = tavily.search(query=query, max_results=3)
    text = "\n\n".join(r["content"] for r in results["results"])
    return {"messages": [HumanMessage(content=f"SEARCH RESULTS:\n{text}", name="search")]}
 
def summarise_worker(state):
    response = llm.invoke([
        SystemMessage(content="Condense the findings so far into 5 bullet "
                              "points. Keep every factual claim traceable to "
                              "the search results."),
        *state["messages"],
    ])
    response.name = "summarise"
    return {"messages": [response]}
 
def factcheck_worker(state):
    response = llm.invoke([
        SystemMessage(content="Compare the summary against the raw search "
                              "results. Reply VERIFIED if every claim is "
                              "supported, otherwise list the corrections."),
        *state["messages"],
    ])
    response.name = "factcheck"
    return {"messages": [response]}

Each worker returns a message tagged with its name. That tag is how the supervisor (and you, while debugging) can tell who said what in the shared history.

Step 2 — Write the Supervisor#

The supervisor is the piece most tutorials hide inside a framework. Written plainly, it’s an LLM call that returns one word: the name of the next worker, or FINISH. Structured output (forcing the LLM to reply in a fixed schema instead of free text) is what makes the router reliable. A free-text “I think we should search next!” reply would break your graph.

python

# supervisor.py
from typing import Literal
from pydantic import BaseModel
from langchain_core.messages import SystemMessage
from workers import llm
 
class Route(BaseModel):
    next: Literal["search", "summarise", "factcheck", "FINISH"]
 
SUPERVISOR_PROMPT = """You manage three workers:
- search: finds raw information on the web
- summarise: condenses findings into bullet points
- factcheck: verifies the summary against the raw results
 
Decide who acts next. Typical order: search -> summarise -> factcheck -> FINISH.
Only choose FINISH after factcheck has run at least once."""
 
router = llm.with_structured_output(Route)
 
def supervisor(state):
    decision = router.invoke(
        [SystemMessage(content=SUPERVISOR_PROMPT), *state["messages"]]
    )
    return {"next_worker": decision.next,
            "worker_calls": state["worker_calls"] + 1}

This is where most people get stuck — not the code, the prompt. My first supervisor prompt didn’t include the “only FINISH after factcheck” rule, and the team happily skipped verification on easy questions. The supervisor does exactly what the prompt allows, nothing more.

Common mistake: Letting the supervisor see only the last message instead of the full history. It then re-sends workers to do jobs that are already done. Always pass *state["messages"] — the whole conversation — into the routing call.

Step 3 — Wire the Graph (and Add the Loop Guards)#

Now the part that earns its place in production: wiring everything together with two independent stop conditions. The supervisor thinks it knows when to finish; the route function proves it mathematically with a counter.

python

# team.py
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage
from team_state import TeamState
from workers import search_worker, summarise_worker, factcheck_worker
from supervisor import supervisor
 
MAX_WORKER_CALLS = 8   # hard budget — tune for your task
 
def route(state):
    if state["worker_calls"] >= MAX_WORKER_CALLS:
        return END                      # never trust the supervisor alone
    if state["next_worker"] == "FINISH":
        return END
    return state["next_worker"]
 
graph = StateGraph(TeamState)
graph.add_node("supervisor", supervisor)
graph.add_node("search", search_worker)
graph.add_node("summarise", summarise_worker)
graph.add_node("factcheck", factcheck_worker)
 
graph.set_entry_point("supervisor")
graph.add_conditional_edges("supervisor", route)
for worker in ("search", "summarise", "factcheck"):
    graph.add_edge(worker, "supervisor")   # every worker reports back
 
team = graph.compile()
 
if __name__ == "__main__":
    result = team.invoke(
        {"messages": [HumanMessage(content="What changed for multi-agent apps in LangGraph 1.x?")],
         "next_worker": "", "worker_calls": 0},
        config={"recursion_limit": 25},
    )
    print(result["messages"][-1].content)

Run python team.py and watch the sequence: supervisor → search → supervisor → summarise → supervisor → factcheck → supervisor → FINISH. On my machine the full run takes 9–14 seconds and four LLM calls — a single-agent version of the same task takes one or two. That roughly 3x cost is the real price of the supervisor pattern, and it’s worth paying only when the task genuinely needs separate specialists.

The belt-and-braces stopping logic deserves one more sentence. MAX_WORKER_CALLS is our own counter; recursion_limit is LangGraph’s built-in cap on total graph steps. Keep both — the first gives a clean early exit, the second catches bugs in your own routing code.

How CrewAI and LangGraph’s Prebuilt Supervisor Compare#

You now understand the pattern well enough to evaluate the frameworks that package it. The honest comparison:

	Hand-rolled (this post)	`langgraph-supervisor`	CrewAI
Control over routing	Total	High	Medium
Lines to first demo	~90	~25	~20
Debuggability	Every step visible	Good	Harder — more abstraction
Best for	Production, learning	Quick LangGraph teams	Role-based crews, fast prototypes

The langgraph-supervisor library generates almost exactly the graph you just built. Notably, LangChain’s own multi-agent guidance now leans toward building supervisors directly rather than reaching for the wrapper, because owning the routing logic gives you control over what context each agent sees.

CrewAI (1.14 as of writing) is more opinionated: you declare roles and tasks, and it handles orchestration. I reach for CrewAI for demos and hand-rolled LangGraph for anything that has to run unattended. When something misroutes at 2 AM, I want the routing function to be 6 lines of my own Python.

⚠️ Add loop guards first

Multi-agent graphs can loop forever if a supervisor keeps re-delegating. Set a hard step or recursion limit before you run it — an unbounded agent loop burns tokens (and money) fast.

When Multi-Agent Systems Go Wrong#

This pattern has sharp edges, and pretending otherwise would be selling you something. The three failure modes I’ve personally hit:

Ping-pong loops. The supervisor bounces between summarise and factcheck forever because factcheck keeps suggesting small corrections. My first three-agent team did exactly this — six extra round trips before I killed it. The counter guard above is the fix, plus a factcheck prompt that says “VERIFIED” rather than nitpicking.
Cost explosions. Every supervisor turn is a full LLM call over the entire message history, so cost grows quadratically as the conversation lengthens. There are documented cases of looping agent systems burning hundreds of dollars on tasks worth a few cents. Budgets are not optional.
The multi-agent trap. The most common failure isn’t technical — it’s using three agents where one would do. If your task is linear and fits one prompt, a single agent is cheaper, faster, and easier to debug. Reach for a team only when the specialists genuinely conflict inside one prompt.

What to Build Next#

Three extensions, in the order I’d build them:

Parallel workers — search and an extra “news search” worker don’t depend on each other, and LangGraph can fan out to both and join the results. That cuts wall-clock time roughly in half for research tasks.

Per-worker models — the supervisor and factcheck need a strong model; the summariser doesn’t. I swapped my summariser to a cheaper, faster model (Claude Haiku). That cut per-run cost by about a third, with no quality loss I could measure.

Persistent team memory — right now the team forgets everything between runs. Attaching the Postgres checkpointer from Part 2 works as-is, but agent memory deserves its own post — that’s exactly what Part 4 covers.

Conclusion#

You’ve built a real multi-agent system: a supervisor routing three specialist workers through shared state, with two independent guards that guarantee it terminates. More importantly, you’ve built it from scratch. When you do adopt CrewAI or langgraph-supervisor, you’ll know precisely what the framework is doing under the hood — and where to look when it misbehaves.

Quick recap of the rules that matter: one job per worker, all communication through shared state, structured output for routing, and a hard counter the supervisor can’t override.

What’s the first team you’d assemble — and which worker would you trust the least? Tell me in the comments; the most-mentioned use case shapes how I frame Part 4: giving your agent team memory that survives restarts.

The full series — Agentic AI in Python: Zero to Production:

Part 1 — Tools, StateGraph & Memory
Part 2 — FastAPI, Docker & Deploy
Part 3 — Multi-Agent Systems — you’re here
Part 4 — AI Agent Memory
Part 5 — MCP Client & Real Tools
Part 6 — Observability & Evals

🧭 Where to go from here

Need one agent first? Part 1 builds the single agent these workers are based on.
Next in this series: Part 4 — durable agent memory.
Comparing frameworks? LangGraph vs CrewAI vs AutoGen weighs the multi-agent options.

Frequently asked questions

When should I use multiple agents instead of one? +

Only when the task genuinely needs separate specialists that conflict inside one prompt. If it's linear and fits one prompt, a single agent is cheaper, faster, and easier to debug.

How do I stop a multi-agent system from looping forever? +

Use a hard worker-call counter you enforce in code plus LangGraph's recursion_limit. Never trust the supervisor's FINISH decision alone.

How do workers share information? +

They write tagged messages into shared LangGraph state. The supervisor reads that state and decides who acts next — workers never call each other directly.

Should I use CrewAI or hand-rolled LangGraph? +

CrewAI is faster for role-based demos. Hand-rolled LangGraph gives total control for unattended production, where you want the routing logic to be your own few lines of Python.

References

#MultiAgentSystems #AgenticAI #LangGraph #PythonTutorial #CrewAI #AIForDevelopers

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Intermediate · 6 minLangGraph vs CrewAI vs AutoGen: Which to Use in 2026?Sukhveer Kaur · Jun 15, 2026 Beginner · 9 minWhich AI Agent Framework Should You Use in 2026?Sukhveer Kaur · Jun 21, 2026 Intermediate · 1 minAgentic AI in Python: Zero to Production — The Full SeriesSukhveer Kaur · Jun 20, 2026

More by Sukhveer Kaur

Guide · 8 minEvaluate an AI Agent on a Local LLM: Free, No API Key (2026)Sukhveer Kaur · Jul 18, 2026 Guide · 9 minAI Agent Guardrails in Python: Input & Output ValidationSukhveer Kaur · Jul 6, 2026 Comparison · 6 minAgentic Search vs RAG: Which One Do You Actually Need? (2026)Sukhveer Kaur · Jul 6, 2026

Continue the series

← Part 02

Build an Agentic AI App in Python: FastAPI, Docker & Deploy to Production (Part 2)

Part 04 →

Build an Agentic AI App in Python: AI Agent Memory (Part 4)

Get the next part the day it lands

One email per new part. No digest spam.