InfoWok
⌘K
Beginner

Pytest and CI: A Primer for Testing AI Agents (2026)

A pytest and CI primer: write a first test, run it, and wire it into GitHub Actions so every pull request is checked automatically — explained for beginners.

SK
Sukhveer Kaur
Published June 22, 2026
3 min read
On this page +
What pytest isWhat CI isWiring pytest into GitHub ActionsWhy agents need pytest and CIQuick recapFrequently Asked QuestionsConclusion

The last chapters of an agent series — evals, regression gates — assume you already know pytest and CI. If testing has always been the thing you meant to learn later, “wire the eval suite into CI” reads like two unknowns stacked on each other. This pytest and CI primer covers both from zero, so that final step becomes approachable.

The good news: a test is just a function with an assert, and CI is just a server running that function for you on every change. Neither is the deep skill it’s made out to be. Get the basics here and the testing parts of any agent guide stop being a gap.

🟢 Beginner⏱️ 11 min readStack: Python 3.10+, pytest, a GitHub repo
Before you start
  • You can write and run a Python function — new to that? The Python for AI agents primer covers it
  • A GitHub repo if you want to try the CI part (optional for the pytest half)
🎯 Key takeaways
  • A pytest test is a function named test_... with an assert — run pytest and it finds and runs them all.
  • CI runs your tests automatically on every push or pull request, so you never rely on remembering.
  • GitHub Actions wires it up with one YAML file under .github/workflows.
  • Agents need this because a prompt tweak can silently break tool-calling — tests plus CI catch it before it ships.

What pytest is

pytest is Python’s go-to testing tool, and its appeal is how little it asks of you. A test is an ordinary function whose name starts with test_, containing a plain assert. Run the pytest command and it discovers every such function, runs it, and reports what passed (pytest docs).

python
# test_math.py
def add(a, b):
return a + b
def test_add():
assert add(2, 3) == 5 # pass
assert add(-1, 1) == 0
bash
pip install pytest
pytest # finds test_*.py, runs every test_*()

That’s a complete test suite. No classes, no setup ceremony — a test is a function that asserts something is true. When an assertion fails, pytest shows you exactly which one and what the values were, which is most of what makes it pleasant to use.

🔑 The whole idea of a testA test pins down expected behaviour: "given this input, I expect this output." If a future change breaks that expectation, the test fails loudly — so you find out, not your users.

What CI is

CI (continuous integration) means a server runs your tests automatically whenever code changes — typically on every push or pull request. The value is removing the human step: you can forget to run tests, but the machine never does. If the tests fail, CI flags the build red, and you can block the change from merging.

That’s the entire concept. CI doesn’t write tests or change your code; it just runs your existing tests in a clean environment, every time, and reports the result where everyone can see it. The most common free way to do this for a GitHub project is GitHub Actions.

Wiring pytest into GitHub Actions

GitHub Actions runs steps defined in a YAML file under .github/workflows. A minimal Python test workflow checks out the code, sets up Python, installs dependencies, and runs pytest:

yaml
# .github/workflows/tests.yml
name: tests
on: [push, pull_request] # run on every push and PR
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements.txt pytest
- run: pytest # a failing test fails the job

Read it as a checklist the server follows: get the code, install Python, install deps, run pytest (GitHub Actions for Python). The on: [push, pull_request] line is what makes it automatic. Once this is in your repo, every change runs your tests without anyone lifting a finger — and a failing test turns the PR’s check red.

💡 Make the check requiredA red check is only a suggestion until you make it required. In the repo's branch protection settings, mark the tests job as a required status check, and GitHub will block the merge button until it passes. That's the difference between "we have tests" and "broken code can't merge."

Why agents need pytest and CI

Here’s the agent-specific reason this matters. Agent behaviour is fragile in a sneaky way: a small prompt edit or a dependency bump can quietly break tool-calling, and nothing errors — you just get worse answers, discovered when a user complains. A test pins the behaviour you care about, and CI runs it on every change, so a regression fails the build instead of reaching production.

This is exactly the foundation the observability and evals tutorial and the evals-in-CI guide build on — they turn an agent eval suite into a pytest test and gate it with a GitHub Actions workflow just like the one above. Learn pytest and CI here, and those advanced chapters read as a natural extension, not a cliff.

Quick recap

The whole primer, in five lines:

  • A pytest test is a test_... function with an assert; pytest runs them all.
  • CI runs your tests automatically on every push or pull request.
  • GitHub Actions wires it up with one YAML file under .github/workflows.
  • A required status check blocks merges when tests fail.
  • Agents need it because prompt or dependency changes can silently break behaviour.

Frequently Asked Questions

What is pytest? Python’s popular testing tool — write a test_... function with an assert, run pytest, and it finds and runs every test, reporting pass or fail.

What is CI? A server that runs your tests automatically on every change, blocking a merge if they fail, so you don’t rely on remembering.

How do I run pytest in GitHub Actions? Add a workflow YAML under .github/workflows that checks out code, sets up Python, installs deps, and runs pytest on push/PR.

Why do agents need it? A prompt tweak can silently break tool-calling; tests plus CI catch regressions before they ship.

Conclusion

Pytest is just functions that assert, and CI is just a server running them for you on every change — neither is the specialist skill it’s dressed up as. Write one test_ function, add a ten-line workflow, and make the check required, and you’ve turned “I hope it still works” into “the build is green, so it does.” That’s the exact groundwork the evals-in-CI chapter of any agent series stands on.

What’s the first behaviour you’d lock down with a test — a tool call, an output shape, a refusal? Tell me in the comments.

🧭 Where to go from here

Frequently asked questions

What is pytest? +
pytest is Python's most popular testing tool. You write a function named test_something that uses a plain assert, run pytest, and it finds and runs every test, reporting pass or fail. No boilerplate classes required — a test is just a function with an assertion.
What is CI in simple terms? +
CI (continuous integration) means a server automatically runs your tests on every change — usually every push or pull request. Instead of remembering to run tests, the machine runs them for you and blocks a merge if they fail. GitHub Actions is a common, free way to do it.
How do I run pytest in GitHub Actions? +
Add a workflow YAML file under .github/workflows that checks out your code, sets up Python, installs dependencies, and runs pytest. It triggers on push or pull_request, and a failing test fails the job, which can block the merge.
Why do AI agents need tests and CI? +
Because a prompt tweak can quietly break tool-calling, and you'd only find out from a user. Tests pin expected behaviour and CI runs them on every change, so regressions are caught before they ship — the foundation the evals-in-CI workflow builds on.
Advertisement

References

  1. pytest — Get started
  2. Python testing with pytest (official docs)
  3. GitHub Actions — Building and testing Python
  4. GitHub Actions — Workflow syntax

Tags

#PythonForAI#pytest#GitHubActions#CICD#AIAgents#AIForDevelopers

Share

Previous Article
Python Type Hints: A Primer for Reading AI Agent Code (2026)

One email when something good ships

New guides the day they publish. No digest spam.

InfoWokCode-first AI engineering, in Python.
AboutEditorial standardsContactRSS