Local AI, Zero Cost — Part 5. Part 4 separated the viral hype from the truth: the CLI is free, the Claude models aren’t, and one URL decides who answers. This part is the hands-on half — how to connect Claude Code to each backend, with exact copy-paste configs.
Part 4 told you which backend fits your budget and privacy needs. This tutorial wires it up. Every config below comes from the provider’s own docs, not a video description box. You get the env vars, the settings.json blocks that make them stick, the model-mapping variables nobody mentions, and the proxy step that NVIDIA NIM and OpenRouter truly require. If you want to connect Claude Code to a new engine without guesswork, this is the page to keep open.
Fifteen minutes from now, claude in your terminal will be talking to whichever engine you picked — and you’ll know how to switch back.
- Three variables do all the work:
ANTHROPIC_BASE_URL(where requests go),ANTHROPIC_AUTH_TOKEN(the provider’s key), andANTHROPIC_MODEL(which model answers) — set them, open a fresh terminal, done. - Four backends connect directly (Ollama, DeepSeek, GLM, Kimi expose Anthropic-style endpoints); NVIDIA NIM and OpenRouter speak the OpenAI style, so they need a small local translator proxy.
- Put the config in
~/.claude/settings.jsonto survive new terminals — and in a per-project.claude/settings.jsonto give each repo its own backend.
- Claude Code installed (
curl -fsSL https://claude.ai/install.sh | bash) — no subscription needed for third-party backends - An account/API key for the provider you chose — Part 4 compares them on cost, privacy and quality
- Comfort editing a JSON file and pasting terminal commands
The Three Variables That Do Everything#
Claude Code reads its destination from environment variables at launch. That’s the entire mechanism — no forks, no plugins, no patched binaries.
ANTHROPIC_BASE_URL— the server that receives every request. Default: Anthropic. Change it, change the engine.ANTHROPIC_AUTH_TOKEN— the credential sent to that server. Third-party endpoints read this one. Also setANTHROPIC_API_KEY=""so a leftover Anthropic key can’t interfere.ANTHROPIC_MODEL— which of the provider’s models answers. Finer mapping exists too. TheANTHROPIC_DEFAULT_OPUS_MODEL/_SONNET_MODEL/_HAIKU_MODELtrio translates Claude Code’s internal tier names.CLAUDE_CODE_SUBAGENT_MODELpicks a cheaper model for background subagents.
Bottom line: if you understand these three variables, every provider section below is just different values.
Environment variables load when Claude Code starts. Edits never reach an already-open session — close it and open a fresh terminal after every config change. This one habit prevents most “it’s not working” moments.
Make It Permanent: settings.json#
Exports die with the terminal. For a setup you’ll keep, put the same values in ~/.claude/settings.json — Claude Code applies its env block to every session (settings docs):
{
"env": {
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
"ANTHROPIC_AUTH_TOKEN": "your-provider-api-key",
"ANTHROPIC_API_KEY": ""
}
}Claude Code also reads a per-project .claude/settings.json, and project values win. That means one repo can run on GLM while your client work stays on Anthropic — no juggling exports.
Bottom line: shell exports are for trying a backend; settings.json is for keeping it.
Connect Claude Code to Each Provider#
Pick your backend and paste. Each block is the provider’s own published config.
Ollama — local, $0, private#
Since v0.14, Ollama speaks the Anthropic API natively on port 11434. Easiest path:
ollama launch claudeManual equivalent, if you want control:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
claude --model qwen3.5Any tool-capable model you’ve pulled works after --model. For real repositories, raise the context window to 64k+ in Ollama’s settings — agentic sessions eat context. Part 1 helps you pick a model your RAM can hold.
DeepSeek — pay-per-token, cents per session#
DeepSeek hosts an Anthropic-compatible endpoint and documents the full Claude Code setup, including tier mapping:
export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=<your-deepseek-api-key>
export ANTHROPIC_MODEL="deepseek-v4-pro[1m]"
export ANTHROPIC_DEFAULT_HAIKU_MODEL=deepseek-v4-flash
export CLAUDE_CODE_SUBAGENT_MODEL=deepseek-v4-flashThe last two lines are the money-savers. Quick internal calls and subagents run on the flash model; the pro model handles your actual prompts. DeepSeek also auto-maps Claude tier names (opus-prefixed → v4-pro, haiku/sonnet-prefixed → v4-flash) if you skip the mapping vars.
GLM (Z.ai) — the coding-plan favorite#
Z.ai’s GLM Coding Plan is built to plug into Claude Code. The official docs configure it through settings.json:
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your-zai-api-key",
"ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic"
}
}Prefer automation? npx @z_ai/coding-helper walks through the same setup interactively. Your plan’s GLM model answers by default — no model variable needed to start.
Kimi (Moonshot) — long-context specialist#
Moonshot exposes an Anthropic-style endpoint and documents Claude Code support directly:
export ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
export ANTHROPIC_AUTH_TOKEN=<your-moonshot-api-key>
export ANTHROPIC_MODEL=kimi-k2.7-codeUsing the China platform instead? Swap the base URL to https://api.moonshot.cn/anthropic.
NVIDIA NIM and OpenRouter — free tiers, via a proxy#
Here’s the step the viral videos blur past: these two serve the OpenAI-style API, so Claude Code can’t talk to them directly. The bridge is a small translator running on your machine. It shows Claude Code an Anthropic-style endpoint and forwards each request in the provider’s format. The open-source free-claude-code proxy is built for exactly this (it also covers LM Studio and llama.cpp):
- Clone and configure the proxy — set the provider (NIM or OpenRouter) and your API key in its config file.
- Start it — it prints a localhost address.
- Point Claude Code at the proxy:
export ANTHROPIC_BASE_URL=http://localhost:<the-proxy-port>
export ANTHROPIC_API_KEY=""
claudeModel choice lives in the proxy’s config, not in Claude Code. The proxy maps Claude’s tier names to whichever NIM or OpenRouter model you set. Our OpenRouter review covers which :free models are worth mapping.
A proxy adds a moving part. If sessions stall or tools misbehave, check the proxy’s console output first — rate-limit errors from the free tier show up there, not in Claude Code.
When Something Breaks#
- Config edits seem ignored — the session was already open. Close every Claude Code window and start a fresh terminal. Variables load at launch.
- 401 / authentication errors — the key sits in the wrong variable. Third-party endpoints want
ANTHROPIC_AUTH_TOKEN; keepANTHROPIC_API_KEY="". - The model narrates edits instead of making them — a tool-calling gap. Switch to a coder or agentic variant of the provider’s lineup. Plain chat models can’t drive Claude Code’s file tools reliably.
- Truncated or amnesiac sessions on local models — context window too small. Set 64k+ in Ollama for repository work.
- Model-not-found errors — you passed a Claude name to an endpoint that doesn’t know it. Set
ANTHROPIC_MODEL(and the mapping trio) to the provider’s own model IDs.
Keep one tiny shell script per backend (glm.sh, ollama.sh, anthropic.sh) that exports the right variables and runs claude. Switching engines becomes a one-word decision instead of a config-editing session.
Every Config at a Glance#
| Backend | ANTHROPIC_BASE_URL | Auth token | Model setting |
|---|---|---|---|
| Ollama | http://localhost:11434 | ollama | claude --model <name> |
| DeepSeek | https://api.deepseek.com/anthropic | DeepSeek key | deepseek-v4-pro[1m] + flash mapping |
| GLM (Z.ai) | https://api.z.ai/api/anthropic | Z.ai key | plan default |
| Kimi | https://api.moonshot.ai/anthropic | Moonshot key | kimi-k2.7-code |
| NIM / OpenRouter | local proxy address | key lives in proxy | mapped in proxy config |
| Back to Anthropic | unset everything | Claude login | /model in-session |
That last row matters. To connect Claude Code back to the real Claude models, remove the env block (or unset the variables) and open a new terminal. You’re home. Nothing is permanent, nothing is patched — which is exactly why this whole ecosystem works.
- Start simple: wire up Ollama first — it’s free, private, and failure-proof for learning the switch mechanics.
- Choosing between backends? Part 4 compares cost, privacy and quality so the config you paste is the right one.
- Local model underpowered? Part 1 matches models to your RAM, and Part 2 gives the same local stack a VS Code home.
- Next in the series: give the stack tools beyond the repo — Part 6 adds MCP servers without breaking the privacy story.

