AI Agents from Scratch in Python · 02Beginner

Tool Calling in Python: Make an LLM Use Your Functions

Tool calling in Python from scratch: define a tool, parse the model's request, run your function, and return the result — with real, current OpenAI code.

SK

Sukhveer Kaur

Published June 18, 2026 · Updated July 6, 2026

7 min read

Open in ChatGPT Open in Claude

On this page +

What Tool Calling in Python Actually Is Define a Tool and Parse the Call in Python Execute the Function and Return the Result Multiple Tools and Letting the Model Pick The Same Idea in Claude (Anthropic)Tool Calling vs Just Asking for JSON Conclusion

🧰 New here? Set up your environment first · ~5 min

Install Python 3.11+ — confirm with python3 --version.
Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
Install the packages this tutorial lists: pip install -U pip <packages>.
Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

Series: AI Agents from Scratch in Python This is Part 2. So far: Part 0 covered the Python you need to read agent code, and Part 1 made your first LLM call. Here we give the model a way to act. If you can make a basic call, you’re ready.

In Part 1 your model could talk, but it couldn’t do anything — it only returned text. Tool calling in Python is the step that changes that: it lets a model use the functions you write, so it can check the weather, query a database, or send an email. It is also the single idea every AI agent is built on.

Here is the part that confused me at first, and the part most tutorials skip. A language model cannot run code. So how does it “call” your function? It doesn’t — it asks, and your code answers. Once that clicks, the whole thing stops feeling like magic. Let’s build it from scratch in about forty lines.

🟢 Beginner⏱️ 18 minStack: Python 3.10+, openai or anthropic SDK

✅ Before you start

You can make a basic LLM call and read the reply — that’s Part 1
Comfortable reading dicts, functions, and json.loads — the Part 0 primer covers them
An OpenAI or Anthropic API key

🎯 Key takeaways

Tool calling doesn’t let the model run code — it returns a structured request (“call this function with these args”) that your Python executes.
The flow is: define the tool’s schema, parse the model’s call, run the function, and feed the result back into the conversation.
Give the model several tools and let it pick; a clear description per tool is what makes it choose correctly.
Tool calling differs from “just ask for JSON”: the model decides whether and which function to call, not just how to format an answer.

What Tool Calling in Python Actually Is#

The mechanism is a polite negotiation. You tell the model which functions exist; it replies with a structured request — a tool call — naming the function and the arguments it wants; your code runs the real function and hands the result back; and the model uses that result to write its final answer.

The model proposes; your code disposes. That separation is the whole safety story of tool calling — the model never touches your system directly, so you always decide what actually runs. Keep that picture in mind as we wire it up.

When I first used tool calling, I expected the SDK to run my function for me. It doesn’t, and that surprised me — until I saw that the gap is the point. Because your code sits in the middle, a model can never quietly delete a file or spend money without your program agreeing to it first. Every serious agent framework is built on this exact hand-off, so learning it raw means the frameworks hold no mysteries later.

Define a Tool and Parse the Call in Python#

A tool is just a normal Python function. Here is one that pretends to fetch weather:

python

def get_weather(city: str) -> str:
    # In real code this would call a weather API.
    return f"It's 34°C and sunny in {city}."

The model can’t see your function, so you describe it in a schema the model understands — a dictionary with the name, a plain-English description, and the inputs written as JSON Schema (a standard way to describe the shape of data):

python

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

The description matters more than beginners expect — it is how the model decides when to reach for this tool, so write it like a hint to a new teammate. Now pass the tools into the same call you learned in Part 1 and read what comes back:

python

from openai import OpenAI
import json
 
client = OpenAI()
messages = [{"role": "user", "content": "What's the weather in Jaipur?"}]
 
response = client.chat.completions.create(
    model="gpt-5.4-mini", messages=messages, tools=tools,
)
reply = response.choices[0].message
tool_call = reply.tool_calls[0]
 
print(tool_call.function.name)                   # "get_weather"
args = json.loads(tool_call.function.arguments)  # {"city": "Jaipur"}

Notice the model did not answer the question. It returned a tool_call instead, with function.arguments as a JSON string — so you json.loads it into a Python dict before using it. Forgetting that string-to-dict step is the first mistake everyone makes.

The model won’t always want a tool, though. Ask it “hello” and it just replies normally: reply.tool_calls is empty and reply.content holds the answer. So in real code you check whether a tool was requested before reaching for it. I always branch on if reply.tool_calls: first, because assuming a tool call when there isn’t one throws an error on the very first friendly message.

Execute the Function and Return the Result#

Now your code does the part the model can’t: it runs the real function and sends the output back. The order here is exact, and getting it wrong is the most common error in tool calling.

python

result = get_weather(**args)        # run YOUR function with the model's args
 
messages.append(reply)              # 1. the assistant's tool-call message
messages.append({                   # 2. the matching tool result
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": result,
})
 
final = client.chat.completions.create(
    model="gpt-5.4-mini", messages=messages, tools=tools,
)
print(final.choices[0].message.content)   # "It's 34°C and sunny in Jaipur."

Here reply is the assistant’s message object, and the SDK lets you append it to messages directly; if you prefer to store plain dictionaries, use reply.model_dump().

Common mistake: You must append the assistant’s tool-call message before the tool result, and the tool_call_id on the result must match the one the model sent. Skip the assistant message or mismatch the id, and the API rejects the request. The model needs to see its own question and your answer, paired and in order.

The **args syntax unpacks the dictionary into keyword arguments, so get_weather(**{"city": "Jaipur"}) becomes get_weather(city="Jaipur"). That one line is the bridge from the model’s JSON to your real Python function.

Why call the model a second time at all? Because the first response was only a request — the model hasn’t seen the weather yet. The second call shows it the tool result so it can turn "It's 34°C and sunny in Jaipur." into a natural reply. I think of it as two turns of one conversation: the model asks, your code answers, and only then does the model speak to the user.

Multiple Tools and Letting the Model Pick#

One tool is a demo; real agents have several, and the model chooses. The clean pattern is a lookup table that maps each tool name to the actual function, then a small loop that runs whatever the model asked for:

python

def get_time(city: str) -> str:
    return f"It's 3:45 PM in {city}."
 
TOOLS = {"get_weather": get_weather, "get_time": get_time}
 
for call in reply.tool_calls:                 # the model may request several
    fn = TOOLS[call.function.name]            # look up the real function
    args = json.loads(call.function.arguments)
    try:
        output = fn(**args)
    except Exception as error:
        output = f"Tool failed: {error}"      # never let one tool crash the agent
    messages.append({
        "role": "tool", "tool_call_id": call.id, "content": output,
    })

I always wrap the call in try/except, because a tool that raises an exception should hand the model an error string, not crash your program. With this dispatch table you can add a fifth or tenth tool without touching the loop — you just register the function and describe it. Claude works the same way with slightly different shapes, which I show side by side in the next section. The OpenAI function-calling guide and Anthropic tool-use docs are the references for the exact shapes.

One honest caveat: the model fills the arguments, and it can get them wrong — a missing field, a number sent as text. For anything that matters, validate the parsed args before you run the function, which is where Pydantic earns its place. Parallel calls are worth knowing about too: from a single prompt the model can return several tool_calls at once — weather and time together — which is why the loop iterates over reply.tool_calls instead of grabbing index zero. The same loop handles one tool or five without changing a line.

The Same Idea in Claude (Anthropic)#

Because Part 1 used both SDKs, here is the Anthropic version so Claude users aren’t left guessing. The negotiation is identical; only three shapes change.

python

from anthropic import Anthropic
 
client = Anthropic()
tools = [{
    "name": "get_weather",
    "description": "Get the current weather for a city.",
    "input_schema": {                      # note: input_schema, no "function" wrapper
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]
 
msg = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=300, tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Jaipur?"}],
)
 
for block in msg.content:                   # Claude replies in content blocks
    if block.type == "tool_use":
        result = get_weather(**block.input)  # block.input is already a dict
        tool_result = {"role": "user", "content": [{
            "type": "tool_result", "tool_use_id": block.id, "content": result,
        }]}

Three differences to remember: tools use input_schema instead of the type/function wrapper, the arguments arrive as block.input — already a dict, so no json.loads, and you return the answer as a tool_result block inside a user message, not a tool role. Learn one SDK and the other is a ten-minute switch.

📌 Note

The model never runs your function — it only asks you to. Tool calling is a structured request; your code decides whether and how to execute it. That gap is exactly where validation and guardrails belong.

Tool Calling vs Just Asking for JSON#

You might wonder why not simply ask the model for JSON, like the JSON mode from Part 1, and run things yourself. For one fixed task, you can. Tool calling wins when the model should decide whether to act, pick between several functions, and fill the arguments from a vague request. JSON mode gives you structured output; tool calling gives the model agency over your code. That difference — the model choosing to act, not just to format — is precisely what separates a chatbot from an agent, and it is why this one mechanism shows up in every framework you will ever touch.

Conclusion#

You just built the core of every AI agent: describe your functions as tools, let the model request one, run it yourself, and feed the result back. The model proposes and your code disposes — that loop is the whole game. You did it in about forty lines, with no framework in sight.

But notice we ran the tool once by hand. A real agent repeats this — call a tool, see the result, decide the next step — until the job is done. That loop is exactly what Part 3 builds.

What is the first real tool you’d give your agent — a database query, a calculator, an email sender? Tell me in the comments. Next in this series, Part 3 wraps this tool call in the agent loop so the model can take several steps on its own.

Read next: AI Agent Loop in Python: Build a ReAct Agent From Scratch. It wraps this exact tool call in a loop so the model can take several steps on its own.

🧭 Where to go from here

Missed the basics? Part 1 makes the first LLM call; the Part 0 primer covers the Python.
Next in this series: Part 3 — the agent loop, which runs tools until the job is done.
Want safe arguments? Validate tool inputs with the Pydantic AI tutorial.

Frequently asked questions

What is tool calling (or function calling) in an LLM? +

It is the mechanism that lets a model use your code. The model never runs anything itself — it returns a structured request naming a function and its arguments, and your program runs that function and sends the result back. Tool calling and function calling are two names for the same thing.

Does the LLM actually run my Python function? +

No. The model only proposes the call as JSON. Your code decides whether to run it, executes the real function, and returns the output to the model. That separation is what keeps you in control of what actually happens.

Why do I get an error after sending the tool result? +

Almost always because the assistant's tool-call message was not added to the messages list before the tool result, or the tool_call_id does not match. The model needs to see its own request and the matching answer, in order.

Is tool calling the same in OpenAI and Anthropic? +

The idea is identical, the shapes differ. OpenAI returns tool_calls and you reply with a tool role message; Anthropic returns tool_use blocks and you reply with a tool_result block. Learn one and the other takes minutes.

References

#PythonForAI #ToolCalling #FunctionCalling #AIAgents #OpenAI #AgenticAI #AIForDevelopers

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Beginner · 4 minLLM API Keys: Set Up OpenAI, Anthropic & Gemini (2026)Sukhveer Kaur · Jun 22, 2026 Beginner · 9 minWhich AI Agent Framework Should You Use in 2026?Sukhveer Kaur · Jun 21, 2026 Intermediate · 7 minCrewAI Tutorial: Build a Multi-Agent Team in Python (2026)Sukhveer Kaur · Jun 19, 2026

More by Sukhveer Kaur

Opinion · 4 minClaude Code Changes 2026: Subagent Limits, Caps & Opus 5Sukhveer Kaur · Aug 1, 2026 Guide · 7 minClaude Code Skills Tutorial: Build Your First Skill (2026)Sukhveer Kaur · Aug 1, 2026 Guide · 8 minEvaluate an AI Agent on a Local LLM: Free, No API Key (2026)Sukhveer Kaur · Jul 18, 2026

Continue the series

← Part 01

Call an LLM in Python: The First Building Block of an Agent

Part 03 →

AI Agent Loop in Python: Build a ReAct Agent From Scratch

Get the next part the day it lands

One email per new part. No digest spam.