HomeOur TeamContact
HomeTechnology & Innovation
Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)

Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)

Technology & Innovation
June 11, 2026
6 min read
Intermediate
📚 Part of the series: Agentic AI in Python: Zero to Production
Supervisor AI agent orchestrating search, summarise, and fact-check worker agents — multi-agent systems in Python
Table of Contents
01
Why One Agent Isn't Enough
02
Prerequisites and the Build Plan
03
Step 1 — Define the Shared State and the Workers
04
Step 2 — Write the Supervisor
05
Step 3 — Wire the Graph (and Add the Loop Guards)
06
How CrewAI and LangGraph's Prebuilt Supervisor Compare
07
When Multi-Agent Systems Go Wrong
08
What to Build Next
09
Conclusion

Series: Agentic AI in Python — Zero to Production This is Part 3 — multi-agent systems. Quick recap of what we covered so far: — Part 1: Built a local LangGraph research agent with tool use and SQLite memory → /build-agentic-ai-app-python-part-1/ — Part 2: Wrapped it in FastAPI, Dockerised it, and deployed it to a cloud VPS → /build-agentic-ai-app-python-part-2/

If you’re starting here, you need the working agent.py from Part 1 — the deployment layer from Part 2 is optional for this post.

One agent is predictable. Two agents talking to each other is a negotiation. Five agents is a meeting that never ends. This post is about building multi-agent systems that actually finish tasks — and stop when they’re done.

At the end of Part 2, I said the next post would cover managed deployments. That’s still coming, but the question readers actually asked after Part 2 was different: “I have one working agent — how do I make several of them collaborate without losing control?” So Part 3 answers that.

By the end, you’ll have a supervisor agent coordinating three specialist workers — search, summarise, fact-check — built from scratch in plain LangGraph. You’ll also add the guardrails that keep an agent team from burning your API budget in a loop. Let’s start with why you’d want more than one agent at all.

Why One Agent Isn’t Enough

The single agent from Part 1 hits a ceiling, and you’ve probably already felt it. Pack search, summarisation, and verification instructions into one system prompt and the prompt grows until the agent starts ignoring parts of it. A specialist agent with one job and a three-line prompt consistently beats a generalist with a page of instructions — that’s been true in every system I’ve shipped. Multi-agent systems make that split explicit: each specialist gets its own prompt, and something else handles coordination.

That something is the supervisor pattern (one orchestrator agent that decides which specialist acts next, reads the result, and repeats until the task is done). Here’s the whole architecture we’re building:

Supervisor and worker multi-agent architecture diagram: a user request flows to a supervisor agent that delegates to search, summarise, and fact-check workers, which write results into shared LangGraph state until the supervisor returns a final answer

The diagram shows the loop that makes this work. Workers never talk to each other directly — they write results into shared state. The supervisor alone decides who goes next or whether to finish.

Prerequisites and the Build Plan

Everything builds on Part 1’s stack, with one version note: this tutorial uses LangGraph 1.2 (1.2.4 was current when I wrote this in June 2026). Quick checklist before you start:

  • Python 3.11+ and the virtual environment from Part 1
  • pip install -U langgraph langchain-anthropic tavily-python — upgrading matters; the 1.x API is what we use below
  • API keys in .envANTHROPIC_API_KEY and TAVILY_API_KEY, same as Part 1
  • A spending limit on your LLM account — seriously. Runaway loops are the most expensive bug class in multi-agent systems, and we’ll guard against them in code too.

If you don’t have the Part 1 code, build it first — it’s a 30-minute read and this post reuses its patterns.

Five-step flowchart for building a multi-agent system in Python with LangGraph: define shared state, build three worker agents, write the supervisor router, wire the graph with loop guards, then run and inspect each turn

That flowchart is the exact sequence of the next three steps. Steps 1 and 2 are quick; the supervisor in step 3 is where the interesting decisions live.

Step 1 — Define the Shared State and the Workers

Multi-agent systems live or die on one question: what do the agents share? In LangGraph, the answer is explicit — a typed state dict that every node reads and writes. We extend Part 1’s state with two fields the supervisor needs:

python
# team_state.py
import operator
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage
class TeamState(TypedDict):
messages: Annotated[list[BaseMessage], operator.add]
next_worker: str # the supervisor's routing decision
worker_calls: int # how many turns we've used — our loop guard

worker_calls looks boring. It’s the most important field in this file — it’s the hard limit that stops a runaway team, and we’ll enforce it in step 3.

Now the three workers. Each one is just a function with a narrow job and a narrow prompt:

python
# workers.py
import os
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
from tavily import TavilyClient
load_dotenv()
llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0)
tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
def search_worker(state):
query = state["messages"][0].content # the original user request
results = tavily.search(query=query, max_results=3)
text = "\n\n".join(r["content"] for r in results["results"])
return {"messages": [HumanMessage(content=f"SEARCH RESULTS:\n{text}", name="search")]}
def summarise_worker(state):
response = llm.invoke([
SystemMessage(content="Condense the findings so far into 5 bullet "
"points. Keep every factual claim traceable to "
"the search results."),
*state["messages"],
])
response.name = "summarise"
return {"messages": [response]}
def factcheck_worker(state):
response = llm.invoke([
SystemMessage(content="Compare the summary against the raw search "
"results. Reply VERIFIED if every claim is "
"supported, otherwise list the corrections."),
*state["messages"],
])
response.name = "factcheck"
return {"messages": [response]}

Each worker returns a message tagged with its name. That tag is how the supervisor (and you, while debugging) can tell who said what in the shared history.

Step 2 — Write the Supervisor

The supervisor is the piece most tutorials hide inside a framework. Written plainly, it’s an LLM call that returns one word: the name of the next worker, or FINISH. Structured output (forcing the LLM to reply in a fixed schema instead of free text) is what makes the router reliable. A free-text “I think we should search next!” reply would break your graph.

python
# supervisor.py
from typing import Literal
from pydantic import BaseModel
from langchain_core.messages import SystemMessage
from workers import llm
class Route(BaseModel):
next: Literal["search", "summarise", "factcheck", "FINISH"]
SUPERVISOR_PROMPT = """You manage three workers:
- search: finds raw information on the web
- summarise: condenses findings into bullet points
- factcheck: verifies the summary against the raw results
Decide who acts next. Typical order: search -> summarise -> factcheck -> FINISH.
Only choose FINISH after factcheck has run at least once."""
router = llm.with_structured_output(Route)
def supervisor(state):
decision = router.invoke(
[SystemMessage(content=SUPERVISOR_PROMPT), *state["messages"]]
)
return {"next_worker": decision.next,
"worker_calls": state["worker_calls"] + 1}

This is where most people get stuck — not the code, the prompt. My first supervisor prompt didn’t include the “only FINISH after factcheck” rule, and the team happily skipped verification on easy questions. The supervisor does exactly what the prompt allows, nothing more.

Common mistake: Letting the supervisor see only the last message instead of the full history. It then re-sends workers to do jobs that are already done. Always pass *state["messages"] — the whole conversation — into the routing call.

Step 3 — Wire the Graph (and Add the Loop Guards)

Now the part that earns its place in production: wiring everything together with two independent stop conditions. The supervisor thinks it knows when to finish; the route function proves it mathematically with a counter.

python
# team.py
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage
from team_state import TeamState
from workers import search_worker, summarise_worker, factcheck_worker
from supervisor import supervisor
MAX_WORKER_CALLS = 8 # hard budget — tune for your task
def route(state):
if state["worker_calls"] >= MAX_WORKER_CALLS:
return END # never trust the supervisor alone
if state["next_worker"] == "FINISH":
return END
return state["next_worker"]
graph = StateGraph(TeamState)
graph.add_node("supervisor", supervisor)
graph.add_node("search", search_worker)
graph.add_node("summarise", summarise_worker)
graph.add_node("factcheck", factcheck_worker)
graph.set_entry_point("supervisor")
graph.add_conditional_edges("supervisor", route)
for worker in ("search", "summarise", "factcheck"):
graph.add_edge(worker, "supervisor") # every worker reports back
team = graph.compile()
if __name__ == "__main__":
result = team.invoke(
{"messages": [HumanMessage(content="What changed for multi-agent apps in LangGraph 1.x?")],
"next_worker": "", "worker_calls": 0},
config={"recursion_limit": 25},
)
print(result["messages"][-1].content)

Run python team.py and watch the sequence: supervisor → search → supervisor → summarise → supervisor → factcheck → supervisor → FINISH. On my machine the full run takes 9–14 seconds and four LLM calls — a single-agent version of the same task takes one or two. That roughly 3x cost is the real price of the supervisor pattern, and it’s worth paying only when the task genuinely needs separate specialists.

The belt-and-braces stopping logic deserves one more sentence. MAX_WORKER_CALLS is our own counter; recursion_limit is LangGraph’s built-in cap on total graph steps. Keep both — the first gives a clean early exit, the second catches bugs in your own routing code.

How CrewAI and LangGraph’s Prebuilt Supervisor Compare

You now understand the pattern well enough to evaluate the frameworks that package it. The honest comparison:

Hand-rolled (this post)langgraph-supervisorCrewAI
Control over routingTotalHighMedium
Lines to first demo~90~25~20
DebuggabilityEvery step visibleGoodHarder — more abstraction
Best forProduction, learningQuick LangGraph teamsRole-based crews, fast prototypes

The langgraph-supervisor library generates almost exactly the graph you just built. Notably, LangChain’s own multi-agent guidance now leans toward building supervisors directly rather than reaching for the wrapper, because owning the routing logic gives you control over what context each agent sees.

CrewAI (1.14 as of writing) is more opinionated: you declare roles and tasks, and it handles orchestration. I reach for CrewAI for demos and hand-rolled LangGraph for anything that has to run unattended. When something misroutes at 2 AM, I want the routing function to be 6 lines of my own Python.

When Multi-Agent Systems Go Wrong

This pattern has sharp edges, and pretending otherwise would be selling you something. The three failure modes I’ve personally hit:

  • Ping-pong loops. The supervisor bounces between summarise and factcheck forever because factcheck keeps suggesting small corrections. My first three-agent team did exactly this — six extra round trips before I killed it. The counter guard above is the fix, plus a factcheck prompt that says “VERIFIED” rather than nitpicking.
  • Cost explosions. Every supervisor turn is a full LLM call over the entire message history, so cost grows quadratically as the conversation lengthens. There are documented cases of looping agent systems burning hundreds of dollars on tasks worth a few cents. Budgets are not optional.
  • The multi-agent trap. The most common failure isn’t technical — it’s using three agents where one would do. If your task is linear and fits one prompt, a single agent is cheaper, faster, and easier to debug. Reach for a team only when the specialists genuinely conflict inside one prompt.

What to Build Next

Three extensions, in the order I’d build them:

Parallel workers — search and an extra “news search” worker don’t depend on each other, and LangGraph can fan out to both and join the results. That cuts wall-clock time roughly in half for research tasks.

Per-worker models — the supervisor and factcheck need a strong model; the summariser doesn’t. I swapped my summariser to a cheaper, faster model (Claude Haiku). That cut per-run cost by about a third, with no quality loss I could measure.

Persistent team memory — right now the team forgets everything between runs. Attaching the Postgres checkpointer from Part 2 works as-is, but agent memory deserves its own post — that’s exactly what Part 4 covers.

Conclusion

You’ve built a real multi-agent system: a supervisor routing three specialist workers through shared state, with two independent guards that guarantee it terminates. More importantly, you’ve built it from scratch. When you do adopt CrewAI or langgraph-supervisor, you’ll know precisely what the framework is doing under the hood — and where to look when it misbehaves.

Quick recap of the rules that matter: one job per worker, all communication through shared state, structured output for routing, and a hard counter the supervisor can’t override.

What’s the first team you’d assemble — and which worker would you trust the least? Tell me in the comments; the most-mentioned use case shapes how I frame Part 4: giving your agent team memory that survives restarts.

Catch up on the series: Part 1 — Tools, StateGraph & Memory · Part 2 — FastAPI, Docker & Deploy · Subscribe to the RSS feed to catch Part 4 when it lands.

Related: What Are AI Agents? Complete Guide for Developers (2026)


Tags

#MultiAgentSystems#AgenticAI#LangGraph#PythonTutorial#CrewAI#AIForDevelopers

Share

Previous Article
Build an MCP Server in Python: Production-Ready in 2026
More from this author

Sukhveer Kaur

Build an MCP Server in Python: Production-Ready in 2026
Build an MCP Server in Python: Production-Ready in 2026
June 11, 2026
6 min
Intermediate
See all by Sukhveer Kaur

Subscribe to our newsletter!

We'll send you the best of our blog just once a month. We promise.
Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)
6 min left

Sukhveer Kaur

Software Developer & AI Engineer

Popular Posts

01
Build an Agentic AI App in Python: Multi-Agent Systems (Part 3)
Technology & Innovation
·
6 min read

Table Of Contents

1
Why One Agent Isn't Enough
2
Prerequisites and the Build Plan
3
Step 1 — Define the Shared State and the Workers
4
Step 2 — Write the Supervisor
5
Step 3 — Wire the Graph (and Add the Loop Guards)
6
How CrewAI and LangGraph's Prebuilt Supervisor Compare
7
When Multi-Agent Systems Go Wrong
8
What to Build Next
9
Conclusion

Related Posts

© 2026, All Rights Reserved.

Quick Links

Advertise with usOur TeamContact Us

Social Media