InfoWok
Local AI, Zero CostBeginner

Claude Code MCP with Local Models: A Private Agent Stack

Claude Code MCP on a free local stack: add servers with one command, pick tool-capable Ollama models, set scopes right, and dodge the privacy trap.

SK
Sukhveer Kaur
Published July 5, 2026
5 min read
Claude Code MCP local agent stack guide title in a dark terminal-style card with the Local AI Zero Cost series label and indigo accent barsLocal AI, Zero Cost
LOCAL AGENT STACK
On this page +
🧰 New here? Set up your environment first · ~5 min
  1. Install Python 3.11+ — confirm with python3 --version.
  2. Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
  3. Install the packages this tutorial lists: pip install -U pip <packages>.
  4. Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

Local AI, Zero Cost — Part 6. Part 4 freed Claude Code from the subscription, and Part 5 wired it to the backend of your choice. This part adds the last piece: Claude Code MCP servers, so your free local stack can reach databases, browsers and docs — without quietly giving up the privacy you built it for.

Your local Claude Code setup can already read files, run commands and edit code. What it can’t do is query your Postgres database, fetch a live doc page, or remember decisions between sessions. That’s what MCP adds, and the setup is one command.

The interesting part is doing it on a local stack. Tool calling is exactly where small models are weakest, tool definitions eat the context you barely have, and one careless server choice can undo the whole privacy story. This guide covers the one command, the three scopes, the models that can actually drive tools — and the trap.

🎯 Key takeaways
  • MCP lives in the client, not the model: Claude Code manages the servers and executes the calls, so every backend from Part 5 — Ollama included — can use MCP tools.
  • Local models need help: pick tool-tagged models (14B+ or a strong MoE like a GLM flash variant), keep 64k+ context, and add one server at a time — tool definitions eat context fast.
  • Privacy is per-server, not per-model: a local model with a remote MCP server still sends your data out. Fully private means local model and local servers.
🟢 Beginner⏱️ 15 minStack: Claude Code (any backend from Part 5) + Ollama + one or two MCP servers
Before you start
  • A working Claude Code setup on your chosen backend — Part 5 if you haven’t wired one yet
  • Node.js or Python installed (most reference servers launch via npx or uvx)
  • Five minutes of MCP background if it’s new to you: What Is an MCP Server?

What MCP Adds to an Already-Agentic CLI#

Claude Code ships with serious built-in tools — file reads, edits, shell, search. MCP doesn’t replace those; it adds doors to systems the CLI can’t see. A database server turns “check the schema” into a real query. A fetch server pulls live documentation. A memory server keeps project decisions across sessions. Each door is a small program Claude Code launches and talks to on your behalf.

Bottom line: built-in tools make Claude Code an agent in your repo; MCP servers make it an agent in your whole environment.

Claude Code MCP Setup: One Command, Three Scopes#

Adding a server is a single command (official docs). The reference memory server makes a perfect first test — no accounts, no keys, fully local:

bash
claude mcp add memory -- npx -y @modelcontextprotocol/server-memory

Inside a session, /mcp shows what’s connected. The flag that matters is --scope:

  • local (default) — registered for you, in this project only.
  • project — written to .mcp.json at the repo root, committed to git, shared with the team.
  • user — available to you in every project.

When the same server is defined twice, local beats project, which beats user — handy for overriding a team default with your own variant.

A good second door is the official fetch server, which lets the model pull live pages — the server runs locally, though the pages it fetches are ordinary web traffic:

bash
claude mcp add fetch -- uvx mcp-server-fetch

One honest warning for the database door: skip the old reference SQLite server. It’s been moved to the archived repo and carries a publicly disclosed, unpatched SQL-injection flaw. Pick a maintained community SQLite or Postgres server instead — or build your own, which for an internal database is the safest door anyway.

🔑 Key point

Claude Code MCP configuration is entirely client-side. The CLI launches the servers, hands their tool list to the model, and executes whatever the model calls. That’s why the whole Part 5 backend menu — Ollama, DeepSeek, GLM, Kimi — works with MCP unchanged. The servers never know which model is driving.

The Local-Model Catch: Tools Cost Context and Skill#

On Anthropic’s models, you can pile on MCP servers carelessly. On a local stack you can’t, for two reasons.

Tool definitions eat context. Every server you add injects its tool descriptions into the context window. On a 200k-token cloud model that’s noise; on a local model where Ollama’s own guidance is 64k minimum for repository work, three chatty servers can crowd out your actual code.

Emitting tool calls is a trained skill. The model must produce exactly structured calls, every time. Community testing keeps finding the same line: 7–8B models produce inconsistent tool calls, while 14B-class models and mixture-of-experts coder variants handle them reliably. Stick to models tagged for tools — Llama 3.1 instruct, Qwen coder variants, or a GLM flash model that fits in 16 GB of RAM. Part 1 maps what your hardware can hold.

⚠️ Warning

Add one server, test it in a session, then add the next. If the model starts narrating tool use instead of doing it, or calls the wrong tool entirely, you’ve hit its ceiling — remove a server or step up a model size before blaming MCP.

Two commands make that experiment loop reversible:

bash
claude mcp list            # every configured server + its connection status
claude mcp remove fetch    # take a server back out

Bottom line: on a local stack, treat every MCP server as a context expense that must earn its keep.

The Privacy Trap: Local Model ≠ Local Stack#

Here’s the part that undoes careless setups. Running the model locally protects the conversation — your prompts and code never leave the machine. But an MCP server is its own actor. A remote server (GitHub’s hosted MCP, a SaaS connector) receives every query, issue, or page the model asks it for, under that service’s data terms, no matter where the model runs.

  • Local model + local servers (memory, a local database, filesystem) — fully private. Nothing leaves.
  • Local model + remote server — the conversation stays home; the tool traffic doesn’t.
  • Cloud backend + any server — you already accepted the provider’s terms in Part 4; MCP adds the server’s terms on top.

If you connect GitHub’s hosted server — our GitHub MCP tutorial covers it — do it knowingly: it’s enormously useful, and it’s remote. And when no server exists for your internal system, build one in Python; a server you wrote and run locally is the most private integration there is.

Bottom line: privacy is decided per MCP server, not by where the model runs.

The Stack at a Glance#

LayerFully private choiceWhat changes if you go remote
ModelOllama on localhostPrompts + code go to the provider
Claude Codealways local (it’s your CLI)
MCP serversmemory, local DB, self-builtTool traffic goes to each remote service
Scopelocal for experimentsproject shares config with your team via git

Six parts in, the arc is complete: a model your laptop can run, a coding assistant, private RAG, a free Claude Code, any backend you like — and now Claude Code MCP tools that reach the rest of your environment on your terms. That’s a full agent stack with a $0 line item, and you know exactly where every byte goes.

🧭 Where to go from here
  • Start here: add the memory server with the one-liner above, restart, and test recall across sessions.
  • Level up: connect your repos with the GitHub MCP server — knowing it’s the remote row of the table.
  • Go deeper: write your own local server for an internal system with our Python MCP server tutorial.

Frequently asked questions

Does MCP work when Claude Code runs on a non-Anthropic backend? +
Yes. MCP servers are managed by the Claude Code client, not by the model API. Claude Code launches the servers, presents their tools to whichever model you configured, and executes the calls. Any backend from Part 5 — Ollama, DeepSeek, GLM, Kimi — can drive MCP tools, as long as the model handles tool calling well.
Why do small local models struggle with MCP tools? +
Two reasons. Tool definitions are injected into the context, so every server you add shrinks the space left for your code. And emitting exact structured tool calls is a trained skill — community testing finds 7–8B models produce inconsistent calls, while 14B+ models and tool-tagged variants behave far better.
Is a local model plus a remote MCP server still private? +
No, and this is the trap. The model staying on your machine only protects the conversation. Whatever a remote MCP server touches — issues, queries, page contents — travels to that service under its own terms. Fully private means local model and local servers.
What's the difference between local, project and user scope? +
Local scope (default) registers the server for you in the current project only. Project scope writes .mcp.json at the repo root so the team shares it through git. User scope makes it available in all your projects. When the same server appears twice, local beats project, which beats user.
Which MCP server should I add first? +
The reference memory server. It's one command, needs no accounts or keys, runs entirely on your machine, and gives you an obvious test — tell Claude Code to remember something, restart the session, and ask for it back.

References

  1. Claude Code — connect to tools via MCP, official docs
  2. Model Context Protocol — official site
  3. modelcontextprotocol/servers — reference MCP servers (GitHub)
  4. Ollama — Claude Code integration, official docs
  5. Ollama — models with tool support
Written by
Sukhveer Kaur
Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Get the next part the day it lands

One email per new part. No digest spam.

Comments