Local AI, Zero Cost — Part 6. Part 4 freed Claude Code from the subscription, and Part 5 wired it to the backend of your choice. This part adds the last piece: Claude Code MCP servers, so your free local stack can reach databases, browsers and docs — without quietly giving up the privacy you built it for.
Your local Claude Code setup can already read files, run commands and edit code. What it can’t do is query your Postgres database, fetch a live doc page, or remember decisions between sessions. That’s what MCP adds, and the setup is one command.
The interesting part is doing it on a local stack. Tool calling is exactly where small models are weakest, tool definitions eat the context you barely have, and one careless server choice can undo the whole privacy story. This guide covers the one command, the three scopes, the models that can actually drive tools — and the trap.
- MCP lives in the client, not the model: Claude Code manages the servers and executes the calls, so every backend from Part 5 — Ollama included — can use MCP tools.
- Local models need help: pick tool-tagged models (14B+ or a strong MoE like a GLM flash variant), keep 64k+ context, and add one server at a time — tool definitions eat context fast.
- Privacy is per-server, not per-model: a local model with a remote MCP server still sends your data out. Fully private means local model and local servers.
- A working Claude Code setup on your chosen backend — Part 5 if you haven’t wired one yet
- Node.js or Python installed (most reference servers launch via
npxoruvx) - Five minutes of MCP background if it’s new to you: What Is an MCP Server?
What MCP Adds to an Already-Agentic CLI#
Claude Code ships with serious built-in tools — file reads, edits, shell, search. MCP doesn’t replace those; it adds doors to systems the CLI can’t see. A database server turns “check the schema” into a real query. A fetch server pulls live documentation. A memory server keeps project decisions across sessions. Each door is a small program Claude Code launches and talks to on your behalf.
Bottom line: built-in tools make Claude Code an agent in your repo; MCP servers make it an agent in your whole environment.
Claude Code MCP Setup: One Command, Three Scopes#
Adding a server is a single command (official docs). The reference memory server makes a perfect first test — no accounts, no keys, fully local:
claude mcp add memory -- npx -y @modelcontextprotocol/server-memoryInside a session, /mcp shows what’s connected. The flag that matters is --scope:
local(default) — registered for you, in this project only.project— written to.mcp.jsonat the repo root, committed to git, shared with the team.user— available to you in every project.
When the same server is defined twice, local beats project, which beats user — handy for overriding a team default with your own variant.
A good second door is the official fetch server, which lets the model pull live pages — the server runs locally, though the pages it fetches are ordinary web traffic:
claude mcp add fetch -- uvx mcp-server-fetchOne honest warning for the database door: skip the old reference SQLite server. It’s been moved to the archived repo and carries a publicly disclosed, unpatched SQL-injection flaw. Pick a maintained community SQLite or Postgres server instead — or build your own, which for an internal database is the safest door anyway.
Claude Code MCP configuration is entirely client-side. The CLI launches the servers, hands their tool list to the model, and executes whatever the model calls. That’s why the whole Part 5 backend menu — Ollama, DeepSeek, GLM, Kimi — works with MCP unchanged. The servers never know which model is driving.
The Local-Model Catch: Tools Cost Context and Skill#
On Anthropic’s models, you can pile on MCP servers carelessly. On a local stack you can’t, for two reasons.
Tool definitions eat context. Every server you add injects its tool descriptions into the context window. On a 200k-token cloud model that’s noise; on a local model where Ollama’s own guidance is 64k minimum for repository work, three chatty servers can crowd out your actual code.
Emitting tool calls is a trained skill. The model must produce exactly structured calls, every time. Community testing keeps finding the same line: 7–8B models produce inconsistent tool calls, while 14B-class models and mixture-of-experts coder variants handle them reliably. Stick to models tagged for tools — Llama 3.1 instruct, Qwen coder variants, or a GLM flash model that fits in 16 GB of RAM. Part 1 maps what your hardware can hold.
Add one server, test it in a session, then add the next. If the model starts narrating tool use instead of doing it, or calls the wrong tool entirely, you’ve hit its ceiling — remove a server or step up a model size before blaming MCP.
Two commands make that experiment loop reversible:
claude mcp list # every configured server + its connection status
claude mcp remove fetch # take a server back outBottom line: on a local stack, treat every MCP server as a context expense that must earn its keep.
The Privacy Trap: Local Model ≠ Local Stack#
Here’s the part that undoes careless setups. Running the model locally protects the conversation — your prompts and code never leave the machine. But an MCP server is its own actor. A remote server (GitHub’s hosted MCP, a SaaS connector) receives every query, issue, or page the model asks it for, under that service’s data terms, no matter where the model runs.
- Local model + local servers (memory, a local database, filesystem) — fully private. Nothing leaves.
- Local model + remote server — the conversation stays home; the tool traffic doesn’t.
- Cloud backend + any server — you already accepted the provider’s terms in Part 4; MCP adds the server’s terms on top.
If you connect GitHub’s hosted server — our GitHub MCP tutorial covers it — do it knowingly: it’s enormously useful, and it’s remote. And when no server exists for your internal system, build one in Python; a server you wrote and run locally is the most private integration there is.
Bottom line: privacy is decided per MCP server, not by where the model runs.
The Stack at a Glance#
| Layer | Fully private choice | What changes if you go remote |
|---|---|---|
| Model | Ollama on localhost | Prompts + code go to the provider |
| Claude Code | always local (it’s your CLI) | — |
| MCP servers | memory, local DB, self-built | Tool traffic goes to each remote service |
| Scope | local for experiments | project shares config with your team via git |
Six parts in, the arc is complete: a model your laptop can run, a coding assistant, private RAG, a free Claude Code, any backend you like — and now Claude Code MCP tools that reach the rest of your environment on your terms. That’s a full agent stack with a $0 line item, and you know exactly where every byte goes.
- Start here: add the memory server with the one-liner above, restart, and test recall across sessions.
- Level up: connect your repos with the GitHub MCP server — knowing it’s the remote row of the table.
- Go deeper: write your own local server for an internal system with our Python MCP server tutorial.

