The last chapters of an agent series — evals, regression gates — assume you already know pytest and CI. If testing has always been the thing you meant to learn later, “wire the eval suite into CI” reads like two unknowns stacked on each other. This pytest and CI primer covers both from zero, so that final step becomes approachable.
The good news: a test is just a function with an assert, and CI is just a server running that function for you on every change. Neither is the deep skill it’s made out to be. Get the basics here and the testing parts of any agent guide stop being a gap.
- You can write and run a Python function — new to that? The Python for AI agents primer covers it
- A GitHub repo if you want to try the CI part (optional for the pytest half)
- A pytest test is a function named
test_...with anassert— runpytestand it finds and runs them all. - CI runs your tests automatically on every push or pull request, so you never rely on remembering.
- GitHub Actions wires it up with one YAML file under
.github/workflows. - Agents need this because a prompt tweak can silently break tool-calling — tests plus CI catch it before it ships.
What pytest is
pytest is Python’s go-to testing tool, and its appeal is how little it asks of you. A test is an ordinary function whose name starts with test_, containing a plain assert. Run the pytest command and it discovers every such function, runs it, and reports what passed (pytest docs).
# test_math.pydef add(a, b):return a + bdef test_add():assert add(2, 3) == 5 # passassert add(-1, 1) == 0
pip install pytestpytest # finds test_*.py, runs every test_*()
That’s a complete test suite. No classes, no setup ceremony — a test is a function that asserts something is true. When an assertion fails, pytest shows you exactly which one and what the values were, which is most of what makes it pleasant to use.
What CI is
CI (continuous integration) means a server runs your tests automatically whenever code changes — typically on every push or pull request. The value is removing the human step: you can forget to run tests, but the machine never does. If the tests fail, CI flags the build red, and you can block the change from merging.
That’s the entire concept. CI doesn’t write tests or change your code; it just runs your existing tests in a clean environment, every time, and reports the result where everyone can see it. The most common free way to do this for a GitHub project is GitHub Actions.
Wiring pytest into GitHub Actions
GitHub Actions runs steps defined in a YAML file under .github/workflows. A minimal Python test workflow checks out the code, sets up Python, installs dependencies, and runs pytest:
# .github/workflows/tests.ymlname: testson: [push, pull_request] # run on every push and PRjobs:test:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4- uses: actions/setup-python@v5with:python-version: "3.11"- run: pip install -r requirements.txt pytest- run: pytest # a failing test fails the job
Read it as a checklist the server follows: get the code, install Python, install deps, run pytest (GitHub Actions for Python). The on: [push, pull_request] line is what makes it automatic. Once this is in your repo, every change runs your tests without anyone lifting a finger — and a failing test turns the PR’s check red.
Why agents need pytest and CI
Here’s the agent-specific reason this matters. Agent behaviour is fragile in a sneaky way: a small prompt edit or a dependency bump can quietly break tool-calling, and nothing errors — you just get worse answers, discovered when a user complains. A test pins the behaviour you care about, and CI runs it on every change, so a regression fails the build instead of reaching production.
This is exactly the foundation the observability and evals tutorial and the evals-in-CI guide build on — they turn an agent eval suite into a pytest test and gate it with a GitHub Actions workflow just like the one above. Learn pytest and CI here, and those advanced chapters read as a natural extension, not a cliff.
Quick recap
The whole primer, in five lines:
- A pytest test is a
test_...function with anassert;pytestruns them all. - CI runs your tests automatically on every push or pull request.
- GitHub Actions wires it up with one YAML file under
.github/workflows. - A required status check blocks merges when tests fail.
- Agents need it because prompt or dependency changes can silently break behaviour.
Frequently Asked Questions
What is pytest? Python’s popular testing tool — write a test_... function with an assert, run pytest, and it finds and runs every test, reporting pass or fail.
What is CI? A server that runs your tests automatically on every change, blocking a merge if they fail, so you don’t rely on remembering.
How do I run pytest in GitHub Actions? Add a workflow YAML under .github/workflows that checks out code, sets up Python, installs deps, and runs pytest on push/PR.
Why do agents need it? A prompt tweak can silently break tool-calling; tests plus CI catch regressions before they ship.
Conclusion
Pytest is just functions that assert, and CI is just a server running them for you on every change — neither is the specialist skill it’s dressed up as. Write one test_ function, add a ten-line workflow, and make the check required, and you’ve turned “I hope it still works” into “the build is green, so it does.” That’s the exact groundwork the evals-in-CI chapter of any agent series stands on.
What’s the first behaviour you’d lock down with a test — a tool call, an output shape, a refusal? Tell me in the comments.
- Need the Python basics? The Python for AI agents primer covers functions and running code.
- See it applied to agents: observability and evals scores an agent with a test suite.
- Gate your PRs: evals in CI blocks a merge when the agent’s score drops.