Series: AI Agents from Scratch in Python This is Part 2. So far: Part 0 covered the Python you need to read agent code, and Part 1 made your first LLM call. Here we give the model a way to act. If you can make a basic call, you’re ready.
In Part 1 your model could talk, but it couldn’t do anything — it only returned text. Tool calling in Python is the step that changes that: it lets a model use the functions you write, so it can check the weather, query a database, or send an email. It is also the single idea every AI agent is built on.
Here is the part that confused me at first, and the part most tutorials skip. A language model cannot run code. So how does it “call” your function? It doesn’t — it asks, and your code answers. Once that clicks, the whole thing stops feeling like magic. Let’s build it from scratch in about forty lines.
What Tool Calling in Python Actually Is
The mechanism is a polite negotiation. You tell the model which functions exist; it replies with a structured request — a tool call — naming the function and the arguments it wants; your code runs the real function and hands the result back; and the model uses that result to write its final answer.
The model proposes; your code disposes. That separation is the whole safety story of tool calling — the model never touches your system directly, so you always decide what actually runs. Keep that picture in mind as we wire it up.
When I first used tool calling, I expected the SDK to run my function for me. It doesn’t, and that surprised me — until I saw that the gap is the point. Because your code sits in the middle, a model can never quietly delete a file or spend money without your program agreeing to it first. Every serious agent framework is built on this exact hand-off, so learning it raw means the frameworks hold no mysteries later.
Define a Tool and Parse the Call in Python
A tool is just a normal Python function. Here is one that pretends to fetch weather:
def get_weather(city: str) -> str:# In real code this would call a weather API.return f"It's 34°C and sunny in {city}."
The model can’t see your function, so you describe it in a schema the model understands — a dictionary with the name, a plain-English description, and the inputs written as JSON Schema (a standard way to describe the shape of data):
tools = [{"type": "function","function": {"name": "get_weather","description": "Get the current weather for a city.","parameters": {"type": "object","properties": {"city": {"type": "string"}},"required": ["city"],},},}]
The description matters more than beginners expect — it is how the model decides when to reach for this tool, so write it like a hint to a new teammate. Now pass the tools into the same call you learned in Part 1 and read what comes back:
from openai import OpenAIimport jsonclient = OpenAI()messages = [{"role": "user", "content": "What's the weather in Jaipur?"}]response = client.chat.completions.create(model="gpt-5.4-mini", messages=messages, tools=tools,)reply = response.choices[0].messagetool_call = reply.tool_calls[0]print(tool_call.function.name) # "get_weather"args = json.loads(tool_call.function.arguments) # {"city": "Jaipur"}
Notice the model did not answer the question. It returned a tool_call instead, with function.arguments as a JSON string — so you json.loads it into a Python dict before using it. Forgetting that string-to-dict step is the first mistake everyone makes.
The model won’t always want a tool, though. Ask it “hello” and it just replies normally: reply.tool_calls is empty and reply.content holds the answer. So in real code you check whether a tool was requested before reaching for it. I always branch on if reply.tool_calls: first, because assuming a tool call when there isn’t one throws an error on the very first friendly message.
Execute the Function and Return the Result
Now your code does the part the model can’t: it runs the real function and sends the output back. The order here is exact, and getting it wrong is the most common error in tool calling.
result = get_weather(**args) # run YOUR function with the model's argsmessages.append(reply) # 1. the assistant's tool-call messagemessages.append({ # 2. the matching tool result"role": "tool","tool_call_id": tool_call.id,"content": result,})final = client.chat.completions.create(model="gpt-5.4-mini", messages=messages, tools=tools,)print(final.choices[0].message.content) # "It's 34°C and sunny in Jaipur."
Here reply is the assistant’s message object, and the SDK lets you append it to messages directly; if you prefer to store plain dictionaries, use reply.model_dump().
Common mistake: You must append the assistant’s tool-call message before the tool result, and the
tool_call_idon the result must match the one the model sent. Skip the assistant message or mismatch the id, and the API rejects the request. The model needs to see its own question and your answer, paired and in order.
The **args syntax unpacks the dictionary into keyword arguments, so get_weather(**{"city": "Jaipur"}) becomes get_weather(city="Jaipur"). That one line is the bridge from the model’s JSON to your real Python function.
Why call the model a second time at all? Because the first response was only a request — the model hasn’t seen the weather yet. The second call shows it the tool result so it can turn "It's 34°C and sunny in Jaipur." into a natural reply. I think of it as two turns of one conversation: the model asks, your code answers, and only then does the model speak to the user.
Multiple Tools and Letting the Model Pick
One tool is a demo; real agents have several, and the model chooses. The clean pattern is a lookup table that maps each tool name to the actual function, then a small loop that runs whatever the model asked for:
def get_time(city: str) -> str:return f"It's 3:45 PM in {city}."TOOLS = {"get_weather": get_weather, "get_time": get_time}for call in reply.tool_calls: # the model may request severalfn = TOOLS[call.function.name] # look up the real functionargs = json.loads(call.function.arguments)try:output = fn(**args)except Exception as error:output = f"Tool failed: {error}" # never let one tool crash the agentmessages.append({"role": "tool", "tool_call_id": call.id, "content": output,})
I always wrap the call in try/except, because a tool that raises an exception should hand the model an error string, not crash your program. With this dispatch table you can add a fifth or tenth tool without touching the loop — you just register the function and describe it. Claude works the same way with slightly different shapes, which I show side by side in the next section. The OpenAI function-calling guide and Anthropic tool-use docs are the references for the exact shapes.
One honest caveat: the model fills the arguments, and it can get them wrong — a missing field, a number sent as text. For anything that matters, validate the parsed args before you run the function, which is where Pydantic earns its place. Parallel calls are worth knowing about too: from a single prompt the model can return several tool_calls at once — weather and time together — which is why the loop iterates over reply.tool_calls instead of grabbing index zero. The same loop handles one tool or five without changing a line.
The Same Idea in Claude (Anthropic)
Because Part 1 used both SDKs, here is the Anthropic version so Claude users aren’t left guessing. The negotiation is identical; only three shapes change.
from anthropic import Anthropicclient = Anthropic()tools = [{"name": "get_weather","description": "Get the current weather for a city.","input_schema": { # note: input_schema, no "function" wrapper"type": "object","properties": {"city": {"type": "string"}},"required": ["city"],},}]msg = client.messages.create(model="claude-sonnet-4-6", max_tokens=300, tools=tools,messages=[{"role": "user", "content": "What's the weather in Jaipur?"}],)for block in msg.content: # Claude replies in content blocksif block.type == "tool_use":result = get_weather(**block.input) # block.input is already a dicttool_result = {"role": "user", "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result,}]}
Three differences to remember: tools use input_schema instead of the type/function wrapper, the arguments arrive as block.input — already a dict, so no json.loads, and you return the answer as a tool_result block inside a user message, not a tool role. Learn one SDK and the other is a ten-minute switch.
Tool Calling vs Just Asking for JSON
You might wonder why not simply ask the model for JSON, like the JSON mode from Part 1, and run things yourself. For one fixed task, you can. Tool calling wins when the model should decide whether to act, pick between several functions, and fill the arguments from a vague request. JSON mode gives you structured output; tool calling gives the model agency over your code. That difference — the model choosing to act, not just to format — is precisely what separates a chatbot from an agent, and it is why this one mechanism shows up in every framework you will ever touch.
Conclusion
You just built the core of every AI agent: describe your functions as tools, let the model request one, run it yourself, and feed the result back. The model proposes and your code disposes — that loop is the whole game. You did it in about forty lines, with no framework in sight.
But notice we ran the tool once by hand. A real agent repeats this — call a tool, see the result, decide the next step — until the job is done. That loop is exactly what Part 3 builds.
What is the first real tool you’d give your agent — a database query, a calculator, an email sender? Tell me in the comments. Next in this series, Part 3 wraps this tool call in the agent loop so the model can take several steps on its own.
Read next: Build an Agentic AI App in Python (Part 1). It shows these same tools running inside a full, deployed agent.








