Series: AI Agents from Scratch in Python This is Part 1. If the Python in the examples — dicts,
os.getenv, f-strings — looks unfamiliar, the optional Part 0 primer explains exactly what you need in ten minutes. Comfortable with Python? You’re in the right place.
Every “build an AI agent” tutorial starts by installing a framework. That hides the one thing you actually need to understand first: a single call to a model. Before LangChain, before agents, before any of the buzzwords, an agent is built on the ability to call an LLM in Python and read its reply. Master that one move and the rest of this series clicks into place.
In this part you will make your first real call — twice, once with OpenAI and once with Anthropic’s Claude — and learn the few knobs that control what comes back. If you are still hazy on what an agent even is, my guide to what AI agents actually are gives the big picture; here we get our hands on the keyboard.
What an LLM Call Actually Is
At its core, calling a large language model (an LLM — the kind of model behind ChatGPT and Claude) is simple: you send text in, and you get text back. You send a prompt (your instructions and question), the model predicts a response one token (a chunk of a word, roughly four characters) at a time, and you read the result. There is no hidden session and no memory.
Tokens matter for two practical reasons. Providers bill you per token, and every model has a maximum number it can process in one call. For learning, the numbers are tiny — a short question and its answer together run well under a hundred tokens, costing a fraction of a cent. You only start thinking hard about tokens later, when conversations grow long.
The diagram shows the whole shape. You hand the model a messages list, it returns a response object, and the reply text lives one or two attributes deep inside that object. Everything else in this post is detail on top of that single round trip.
Call an LLM in Python: Your First Request
Let’s make it real. First install the SDKs — the official Python libraries each provider ships:
pip install openai anthropic
Before any call works, you need an API key — a secret string that authorises your requests and ties usage to your account. Create one in the provider’s console (OpenAI or Anthropic), then store it as an environment variable so it never lands in your code or on GitHub:
export OPENAI_API_KEY="sk-..." # macOS / Linux, current terminal
On Windows, or to make it stick between sessions, keep the key in a .env file and load it with python-dotenv — the Part 0 primer walks through that setup. With the key in place, the SDK finds it on its own.
Now the OpenAI version. It reads your key from the OPENAI_API_KEY environment variable, so you never write the key in code:
from openai import OpenAIclient = OpenAI() # picks up OPENAI_API_KEY from the environmentresponse = client.chat.completions.create(model="gpt-5.4-mini",messages=[{"role": "system", "content": "You are a concise travel guide."},{"role": "user", "content": "Name one thing to do in Jaipur."},],)print(response.choices[0].message.content)
Two roles are doing the work here. The system message sets who the assistant is and how it should behave; the user message is the actual request. The reply is buried at response.choices[0].message.content — a path worth memorising, because you will type it constantly.
The same idea in Anthropic’s SDK looks slightly different:
from anthropic import Anthropicclient = Anthropic() # picks up ANTHROPIC_API_KEY from the environmentmessage = client.messages.create(model="claude-sonnet-4-6",max_tokens=200, # required by Anthropic, optional for OpenAIsystem="You are a concise travel guide.",messages=[{"role": "user", "content": "Name one thing to do in Jaipur."},],)print(message.content[0].text)
Common mistake: Two things trip up beginners switching between the SDKs. Anthropic takes the system prompt as a separate
systemparameter, not as a message in the list, and it requiresmax_tokenson every call. Leave either out and you will hit an error that the OpenAI code never throws.
A quick word on model names. I used gpt-5.4-mini and claude-sonnet-4-6 because they are cheap and fast — ideal for learning, where you make many calls. Model names change every few months, so treat these as placeholders: check the provider’s model list and drop in the current small model. The code around the name stays exactly the same, which is part of why learning the call itself matters more than memorising any one model.
Controlling the Output: Temperature, Tokens, and JSON
A raw call gives you a sensible default, but three settings let you steer it. The first time I shipped a feature on top of an LLM, getting these right mattered more than the prompt itself.
import jsonresponse = client.chat.completions.create(model="gpt-5.4-mini",messages=[{"role": "user", "content": "List 3 packing items for Jaipur as JSON."}],temperature=0.2, # 0 = focused, up to 2 = more randommax_tokens=100, # caps the length of the replyresponse_format={"type": "json_object"}, # ask for valid JSON back)items = json.loads(response.choices[0].message.content)print(items)
Temperature controls randomness: I keep it near 0.2 for anything I need to parse, and push it higher only for brainstorming. The scale runs from 0 (the model picks the most likely words every time) up to 2 (loose and surprising). max_tokens caps how long the reply can be, which protects you from runaway cost on a stray long answer. And JSON mode (response_format) makes the model return valid JSON you can load straight into a Python dict with json.loads — the bridge between an LLM and the rest of your code.
One caveat I learned the hard way: JSON mode guarantees valid JSON syntax, not the fields you asked for. The model can still omit a key or invent one. For anything that matters, validate the result before you trust it — which is exactly where Pydantic earns its place. When you need a guarantee on the shape, providers now offer a stricter option (response_format with a json_schema and strict: true); json_object is the simplest way to meet the idea first. We lean on this JSON bridge heavily in Part 2, when the model starts calling your functions.
Why One Call Isn’t an Agent
Here is the honest limit of everything above: a single call is stateless, so the model forgets the previous message the instant it replies. Ask a follow-up in a new call and it has no idea what you were talking about. It also can’t take any action in the world — it only returns text.
You can prove the forgetting in two calls: tell it “My name is Asha” in one request, then ask “What is my name?” in a fresh one. The second reply is a polite shrug. The only way the model “remembers” is if you resend the earlier messages every time, appending each new turn to the messages list yourself. Doing that by hand works but stays crude; Part 4 replaces it with real memory, and Part 3 adds the loop that lets the model take several steps toward a goal.
An agent is what you get when you wrap this atom in three things: a loop so it can take more than one step, tools so it can act, and memory so it remembers. That is the exact path this series walks — and it all sits on the call you just made.
Conclusion
You now have the one move everything else depends on: send a messages list, read the reply from the response object, and steer it with temperature, max_tokens, and JSON mode. The OpenAI and Anthropic SDKs differ only in small details, and you have seen both. For the full parameter list, the OpenAI API docs and the Anthropic API docs are the references to keep open.
What is the first thing you want your agent to actually do once it can act? Tell me in the comments — it helps me tune the upcoming parts. Next in this series, Part 2 turns this one-way call into tool calling, where the model asks your Python functions to run.
Read next: Build an Agentic AI App in Python (Part 1). It puts these calls inside a full, running agent so you can see where the atom fits.








