AI Agents from Scratch in Python · 01Beginner

Call an LLM in Python: The First Building Block of an Agent

Call an LLM in Python with real OpenAI and Anthropic code: system vs user prompts, temperature, JSON output, and why one call isn't yet an agent.

SK

Sukhveer Kaur

Published June 18, 2026 · Updated July 6, 2026

6 min read

Open in ChatGPT Open in Claude

On this page +

What an LLM Call Actually Is Call an LLM in Python: Your First Request Controlling the Output: Temperature, Tokens, and JSON Why One Call Isn't an Agent Conclusion

🧰 New here? Set up your environment first · ~5 min

Install Python 3.11+ — confirm with python3 --version.
Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
Install the packages this tutorial lists: pip install -U pip <packages>.
Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

Series: AI Agents from Scratch in Python This is Part 1. If the Python in the examples — dicts, os.getenv, f-strings — looks unfamiliar, the optional Part 0 primer explains exactly what you need in ten minutes. Comfortable with Python? You’re in the right place.

Every “build an AI agent” tutorial starts by installing a framework. That hides the one thing you actually need to understand first: a single call to a model. Before LangChain, before agents, before any of the buzzwords, an agent is built on the ability to call an LLM in Python and read its reply. Master that one move and the rest of this series clicks into place.

In this part you will make your first real call — twice, once with OpenAI and once with Anthropic’s Claude — and learn the few knobs that control what comes back. If you are still hazy on what an agent even is, my guide to what AI agents actually are gives the big picture; here we get our hands on the keyboard.

🟢 Beginner⏱️ 15 minStack: Python 3.10+, openai or anthropic SDK, one API key

✅ Before you start

You can read basic Python (dicts, functions, f-strings) — if not, the Part 0 primer covers it in ten minutes
An API key from OpenAI or Anthropic, with a small billing limit set
Python 3.10+ and pip — one SDK install, no framework

🎯 Key takeaways

An LLM call is one stateless HTTP request: you send messages, you get text back — nothing is remembered between calls.
Control the output with temperature (randomness), max tokens (length), and asking for JSON when you need structure.
The same code shape works across providers — swap the OpenAI client for Anthropic with minimal change.
One call is not an agent: there’s no loop, no tools, no memory yet — that’s what the rest of the series adds.

What an LLM Call Actually Is#

At its core, calling a large language model (an LLM — the kind of model behind ChatGPT and Claude) is simple: you send text in, and you get text back. You send a prompt (your instructions and question), the model predicts a response one token (a chunk of a word, roughly four characters) at a time, and you read the result. There is no hidden session and no memory.

Tokens matter for two practical reasons. Providers bill you per token, and every model has a maximum number it can process in one call. For learning, the numbers are tiny — a short question and its answer together run well under a hundred tokens, costing a fraction of a cent. You only start thinking hard about tokens later, when conversations grow long.

The diagram shows the whole shape. You hand the model a messages list, it returns a response object, and the reply text lives one or two attributes deep inside that object. Everything else in this post is detail on top of that single round trip.

Call an LLM in Python: Your First Request#

Let’s make it real. First install the SDKs — the official Python libraries each provider ships:

bash

pip install openai anthropic

Before any call works, you need an API key — a secret string that authorises your requests and ties usage to your account. Create one in the provider’s console (OpenAI or Anthropic), then store it as an environment variable so it never lands in your code or on GitHub:

bash

export OPENAI_API_KEY="sk-..."   # macOS / Linux, current terminal

On Windows, or to make it stick between sessions, keep the key in a .env file and load it with python-dotenv — the Part 0 primer walks through that setup. With the key in place, the SDK finds it on its own.

Now the OpenAI version. It reads your key from the OPENAI_API_KEY environment variable, so you never write the key in code:

python

from openai import OpenAI
 
client = OpenAI()  # picks up OPENAI_API_KEY from the environment
 
response = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[
        {"role": "system", "content": "You are a concise travel guide."},
        {"role": "user", "content": "Name one thing to do in Jaipur."},
    ],
)
print(response.choices[0].message.content)

Two roles are doing the work here. The system message sets who the assistant is and how it should behave; the user message is the actual request. The reply is buried at response.choices[0].message.content — a path worth memorising, because you will type it constantly.

The same idea in Anthropic’s SDK looks slightly different:

python

from anthropic import Anthropic
 
client = Anthropic()  # picks up ANTHROPIC_API_KEY from the environment
 
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,                      # required by Anthropic, optional for OpenAI
    system="You are a concise travel guide.",
    messages=[
        {"role": "user", "content": "Name one thing to do in Jaipur."},
    ],
)
print(message.content[0].text)

Common mistake: Two things trip up beginners switching between the SDKs. Anthropic takes the system prompt as a separate system parameter, not as a message in the list, and it requires max_tokens on every call. Leave either out and you will hit an error that the OpenAI code never throws.

A quick word on model names. I used gpt-5.4-mini and claude-sonnet-4-6 because they are cheap and fast — ideal for learning, where you make many calls. Model names change every few months, so treat these as placeholders: check the provider’s model list and drop in the current small model. The code around the name stays exactly the same, which is part of why learning the call itself matters more than memorising any one model.

Controlling the Output: Temperature, Tokens, and JSON#

A raw call gives you a sensible default, but three settings let you steer it. The first time I shipped a feature on top of an LLM, getting these right mattered more than the prompt itself.

python

import json
 
response = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "List 3 packing items for Jaipur as JSON."}],
    temperature=0.2,                         # 0 = focused, up to 2 = more random
    max_tokens=100,                          # caps the length of the reply
    response_format={"type": "json_object"},  # ask for valid JSON back
)
items = json.loads(response.choices[0].message.content)
print(items)

Temperature controls randomness: I keep it near 0.2 for anything I need to parse, and push it higher only for brainstorming. The scale runs from 0 (the model picks the most likely words every time) up to 2 (loose and surprising). max_tokens caps how long the reply can be, which protects you from runaway cost on a stray long answer. And JSON mode (response_format) makes the model return valid JSON you can load straight into a Python dict with json.loads — the bridge between an LLM and the rest of your code.

One caveat I learned the hard way: JSON mode guarantees valid JSON syntax, not the fields you asked for. The model can still omit a key or invent one. For anything that matters, validate the result before you trust it — which is exactly where Pydantic earns its place. When you need a guarantee on the shape, providers now offer a stricter option (response_format with a json_schema and strict: true); json_object is the simplest way to meet the idea first. We lean on this JSON bridge heavily in Part 2, when the model starts calling your functions.

🔑 Key point

An LLM call is a pure function: text in, text out, no memory of the last call. Everything that makes something an “agent” — memory, tools, looping — is scaffolding you add around this stateless call.

Why One Call Isn’t an Agent#

Here is the honest limit of everything above: a single call is stateless, so the model forgets the previous message the instant it replies. Ask a follow-up in a new call and it has no idea what you were talking about. It also can’t take any action in the world — it only returns text.

You can prove the forgetting in two calls: tell it “My name is Asha” in one request, then ask “What is my name?” in a fresh one. The second reply is a polite shrug. The only way the model “remembers” is if you resend the earlier messages every time, appending each new turn to the messages list yourself. Doing that by hand works but stays crude; Part 4 replaces it with real memory, and Part 3 adds the loop that lets the model take several steps toward a goal.

An agent is what you get when you wrap this atom in three things: a loop so it can take more than one step, tools so it can act, and memory so it remembers. That is the exact path this series walks — and it all sits on the call you just made.

Conclusion#

You now have the one move everything else depends on: send a messages list, read the reply from the response object, and steer it with temperature, max_tokens, and JSON mode. The OpenAI and Anthropic SDKs differ only in small details, and you have seen both. For the full parameter list, the OpenAI API docs and the Anthropic API docs are the references to keep open.

What is the first thing you want your agent to actually do once it can act? Tell me in the comments — it helps me tune the upcoming parts. Next in this series, Part 2 turns this one-way call into tool calling, where the model asks your Python functions to run.

Read next: Build an Agentic AI App in Python (Part 1). It puts these calls inside a full, running agent so you can see where the atom fits.

🧭 Where to go from here

Need the Python first? The Part 0 primer explains every symbol you’ll meet.
Next in this series: Part 2 — tool calling, where the model asks your functions to run.
Want the full app instead? Build an Agentic AI App in Python (Part 1).

Frequently asked questions

What do I need to call an LLM in Python? +

Python 3.9 or newer, one SDK installed with pip (openai or anthropic), and an API key from that provider stored in an environment variable. That is the whole setup — no GPU, no machine-learning libraries, no framework.

Is calling OpenAI or Anthropic free? +

Both are paid per token, but the amounts are tiny for learning. A small model like gpt-5.4-mini costs a fraction of a cent per short call, and both providers give new accounts some starting credit. Set a billing limit so you never get surprised.

What is the difference between the system and user message? +

The system message sets the assistant's role and rules; the user message is the actual request. OpenAI puts both in the messages list, while Anthropic takes the system prompt as a separate parameter — a small but real difference between the two SDKs.

Why does my LLM forget the previous question? +

Because a single call is stateless — the model only sees what you send in that one request. To make it remember, you resend the earlier messages each time, which is exactly how memory is built later in this series.

References

#PythonForAI #AIAgents #LLM #OpenAI #ClaudeAI #AgenticAI #AIForDevelopers

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Beginner · 5 minAI Agent vs Workflow: What's the Actual Difference? (2026)Sukhveer Kaur · Jun 22, 2026 Beginner · 4 minLLM API Keys: Set Up OpenAI, Anthropic & Gemini (2026)Sukhveer Kaur · Jun 22, 2026 Beginner · 9 minWhich AI Agent Framework Should You Use in 2026?Sukhveer Kaur · Jun 21, 2026

More by Sukhveer Kaur

Opinion · 4 minClaude Code Changes 2026: Subagent Limits, Caps & Opus 5Sukhveer Kaur · Aug 1, 2026 Guide · 7 minClaude Code Skills Tutorial: Build Your First Skill (2026)Sukhveer Kaur · Aug 1, 2026 Guide · 8 minEvaluate an AI Agent on a Local LLM: Free, No API Key (2026)Sukhveer Kaur · Jul 18, 2026

Continue the series

← Part 00

Python for AI Agents: The Basics to Read the Code (Part 0)

Part 02 →

Tool Calling in Python: Make an LLM Use Your Functions

Get the next part the day it lands

One email per new part. No digest spam.