Series: Agentic AI in Python — Zero to Production
This is Part 5 — building an MCP client so your agent can use real tools. The story so far:
- Part 1: A local LangGraph agent with tools and a SQLite checkpointer
- Part 3: A supervisor + workers multi-agent team
- Part 4: Long-term memory across threads
New here? You’ll need Part 1’s
agent.pyand the MCP server from the build pillar — this post wires the two together.
By the end of Part 4 your agent could remember. By the end of this one it can act — calling real tools from the MCP server we built earlier, with authentication, straight from the app you’ve been growing since Part 1. That’s the gap an MCP client closes: it’s the piece on the agent’s side that discovers a server’s tools and lets the model call them.
Here’s the test that matters. Ask your current agent “how much disk space is left on the ops box?” and it will cheerfully make up a number, because it has no way to look. We’re going to give it that way. We’ll connect it to the authenticated FastMCP server from the build pillar, let the model pick tools on its own, and harden the connection so a slow or rude server can’t take your agent down. Three short steps. First, why your agent needs a client at all.
Why Your Agent Needs an MCP Client
An agent without tools is a very expensive autocomplete. It can reason about your disk usage, your database, your GitHub issues — but it can’t touch any of them. The whole point of the Model Context Protocol (MCP — a standard way for agents to talk to tools) is to split that problem in two: a server exposes tools, and a client lets your agent consume them.
The diagram shows the split. Your agent (everything from Parts 1–4) talks to a thin MCP client, the client opens an authenticated connection to the server, and the server’s tools reach your real data. The client is the only new thing you’re adding — the server already exists, and the agent barely changes. You wrote the hard half (the server) back in the build pillar; this is the easy half that makes it pay off.
I’ll be honest about why this part waited until Part 5. Tools are tempting to add first, but an agent that can act before it can remember repeats itself and forgets what it already tried. Memory came first on purpose. Now the order pays off.
Step 1 — Connect the Agent to the MCP Server
Before any code, here’s the short prerequisite list — everything assumes the server is already running from the build pillar.
- The Part 1
agent.py(a compiled LangGraph graph) - The FastMCP Ops Server from the build pillar, reachable at
http://localhost:8000/mcp -
pip install -U langchain-mcp-adapters(I used 0.1.x, current in June 2026) - A valid Bearer token for the server (the OAuth 2.1 token your server validates)
The flowchart is the whole plan. The connection itself is a single object. MultiServerMCPClient takes a dict of named servers — each with a transport, a URL, and (for our authenticated server) the headers to send on every request:
# mcp_client.py — connect to the Ops Serverimport osfrom langchain_mcp_adapters.client import MultiServerMCPClientclient = MultiServerMCPClient({"ops": {"transport": "streamable_http","url": "http://localhost:8000/mcp","headers": {"Authorization": f"Bearer {os.environ['OPS_MCP_TOKEN']}",},}})tools = await client.get_tools() # discovers every tool the server exposesprint([t.name for t in tools]) # ['disk_usage', ...]
Two things to notice. The transport must be streamable_http, not stdio — your server runs over HTTP (that was the whole point of making it production-ready), so a stdio client will never reach it. And get_tools() is where discovery happens: the client asks the server what it offers and hands you back LangChain-compatible tool objects, schemas and all. You never hand-write a tool definition.
The token lives in an environment variable, never in the file. The header is attached to every request the client makes, which is exactly what the server’s JWTVerifier expects.
Step 2 — Let the Agent Choose and Call Tools
Discovery gives you tools; now the model has to actually use them. The good news is that the tools from get_tools() are ordinary LangChain tools, so they drop straight into the agent you already have. For a fresh LangGraph agent, that’s one line:
# agent_with_tools.py — LangGraphfrom langgraph.prebuilt import create_react_agentfrom langchain.chat_models import init_chat_modelmodel = init_chat_model("anthropic:claude-sonnet-4-5")tools = await client.get_tools()agent = create_react_agent(model, tools)result = await agent.ainvoke({"messages": [{"role": "user","content": "How much disk is left on the ops box?"}]})print(result["messages"][-1].content)
create_react_agent wires up the loop: the model sees the tool schemas, decides disk_usage is the right call, the client executes it against your server, and the result comes back into the conversation. You don’t route anything by hand — the model chooses.
If you’re using Pydantic AI (the typed framework from this companion tutorial), the same server plugs in as a toolset:
# agent_with_tools.py — Pydantic AIfrom pydantic_ai import Agentfrom pydantic_ai.mcp import MCPServerStreamableHTTPops = MCPServerStreamableHTTP("http://localhost:8000/mcp",headers={"Authorization": f"Bearer {os.environ['OPS_MCP_TOKEN']}"},)agent = Agent("anthropic:claude-sonnet-4-5", toolsets=[ops])async with agent: # opens/closes the MCP connectionresult = await agent.run("How much disk is left on the ops box?")print(result.output)
I prefer Pydantic AI’s async with agent: here because it ties the connection’s lifecycle to the run — open on entry, closed on exit, no leaked sockets. (Heads-up for the future: Pydantic is migrating these classes to a unified MCPToolset built on the FastMCP client, so check the docs if you’re on a newer version.) Either way, the agent that could only think about your ops box can now read it.
Step 3 — Errors, Timeouts, and Tool-Call Guardrails
This is the section the happy-path tutorials skip, and it’s the one that decides whether your agent survives contact with production. A remote tool call is a network call: it can be slow, it can fail, and the tool can return something nasty. Wrap every tool call with a timeout and a fallback, or one stuck server will hang your whole agent.
# guarded tool executionimport asyncioasync def call_with_timeout(tool, args, seconds=10):try:return await asyncio.wait_for(tool.ainvoke(args), timeout=seconds)except asyncio.TimeoutError:return "Tool timed out — tell the user the ops box is unreachable."except Exception as e: # auth, transport, server errorsreturn f"Tool failed: {e}. Do not retry automatically."
The returned string matters more than it looks. When a tool fails, you don’t crash — you hand the model a plain-language message it can relay to the user, so the agent degrades gracefully instead of dying. In my testing, a healthy disk_usage call round-trips in roughly 200–400ms; I set the timeout at 10 seconds so a genuinely stuck server trips it long before a user gives up.
The second guardrail is about trust. Anything a tool returns becomes part of the model’s context, so a compromised or buggy server can try a prompt injection (text crafted to hijack the model’s instructions). Two cheap defences go a long way:
- Least privilege on the token — the Bearer token should grant only the scopes the agent truly needs, so a hijacked tool call can’t do much.
- Treat tool output as data, not instructions — keep tool results in clearly labelled tool messages, and never paste them into the system prompt.
Common Mistakes I Hit Wiring This Up
Three mistakes ate most of my debugging time on this part, and they’re the same three I see in every “my agent can’t call the tool” thread.
Common mistake: using
stdiotransport against an HTTP server. The client connects to nothing andget_tools()returns an empty list with no error. Match the transport to how the server actually runs —streamable_httpfor the deployed Ops Server.
The other two are quieter. A missing or expired token shows up as a 401 buried in the client logs, not as a friendly Python error — if get_tools() is empty, check auth before you touch anything else. And blocking calls inside an async agent: MultiServerMCPClient is async, so calling get_tools() without await (or from a sync function) silently returns a coroutine that never runs. If your tools list is a <coroutine object>, that’s the bug.
Testing It End to End
The test that proves it works is the intro question, now with a real answer. Start the Ops Server, export OPS_MCP_TOKEN, then run the agent and ask “How much disk is left on the ops box?” A working setup does three visible things: get_tools() prints ['disk_usage', ...], the agent’s trace shows a disk_usage tool call, and the final answer contains a real number from your machine — not an invented one.
Then break it on purpose to trust your guardrails: stop the server and ask again. You should get “the ops box is unreachable,” not a stack trace. If you get the graceful message, your timeout and fallback are doing their job and you’re production-ready.
What’s Next — and the Series Wrap
Your agent now has the full set: tools (this part), memory (Part 4), a multi-agent structure (Part 3), and a deployable FastAPI service (Part 2). It can think, remember, and act on real systems with authentication. That’s a genuinely useful agent, not a demo.
The honest next frontier is knowing whether it’s any good. Right now you find out it broke when a user tells you. Part 6 closes that gap with observability and evals — tracing every tool call and run, then scoring the agent against a fixed test set so you catch regressions before your users do. I’d build that before adding more tools: capability without measurement is how agents quietly rot.
So, a question to shape Part 6 — and drop your answer in the comments: what would you most want to see about your agent in production: every tool call it makes, the cost per run, or a pass/fail score against your own test cases? Tell me which, and I’ll lead with it.
Catch up on the series: Part 1 — Tools, StateGraph & Memory · Part 3 — Multi-Agent Systems · Part 4 — AI Agent Memory
Related: What Is an MCP Server? A Complete Guide for Developers








