# Free Local AI Coding Assistant in VS Code (2026 Setup)

> Set up a free local AI coding assistant in VS Code with Continue and Ollama: the right Qwen coder models for your RAM, plus chat, edits and autocomplete.

*Source: https://www.infowok.com/local-ai-coding-assistant-vscode/ · Sukhveer Kaur · Published July 5, 2026*

---

> **Local AI, Zero Cost — Part 2.** [Part 1](/best-local-llm-for-your-laptop/) matched your laptop's RAM to the right open-source model. This part turns that setup into a working **local AI coding assistant** in VS Code.

GitHub Copilot costs money every month. Cursor costs more. Meanwhile the models that power a very usable coding assistant have become free, small enough for your laptop, and one config file away from living inside VS Code. The stack is three free pieces: VS Code, the [Continue](https://docs.continue.dev/guides/ollama-guide) extension, and Ollama serving a Qwen coder model. No account, no trial, no token meter — and your code never leaves your machine.

This guide sets the whole thing up in about fifteen minutes: chat with your codebase, inline edits, and tab autocomplete, all running locally.

<Prerequisites>

- Ollama installed with at least one model pulled — [Part 1](/best-local-llm-for-your-laptop/) covers this in ten minutes
- VS Code installed, and comfort pasting a command into a terminal
- 8 GB+ RAM (16 GB gives you the noticeably smarter 7B chat model)

</Prerequisites>

<KeyTakeaways>

- **One assistant needs two models.** A small, fast model (Qwen2.5-Coder 1.5B) handles keystroke-speed autocomplete; a larger one (Qwen2.5-Coder 7B or Qwen3-Coder 30B) handles chat and edits. Continue wires each to a role.
- **The whole stack is free and private.** Continue talks to Ollama at `localhost:11434`, so chat, edits and autocomplete keep working offline and nothing is ever uploaded.
- **Setup is one YAML file.** Pull two models, install the Continue extension, paste a ten-line `config.yaml` — done.

</KeyTakeaways>

## Why go local instead of paying for Copilot

The case mirrors the one from Part 1, sharpened for code:

- **Your code stays yours.** Client work under NDA, proprietary repos, unreleased features — none of it leaves the laptop. There is no cloud inference endpoint to trust, audit or get breached.
- **Zero recurring cost.** Copilot-class autocomplete and chat without a subscription. For students and self-learners, that removes the last excuse.
- **It works on flights and bad Wi-Fi.** Autocomplete keeps firing with the network cable pulled.
- **Unlimited usage.** No monthly premium-request caps, no throttling during a long refactoring session.

And the honest other side: cloud assistants still win on frontier-model reasoning and big agentic multi-file changes. If that's your daily need, our [Cursor vs Claude Code comparison](/cursor-vs-claude-code-2026/) covers the paid landscape. Many developers sensibly run both — local for the 80% of daily completions and questions, cloud for the hardest 20%.

**Bottom line: a local AI coding assistant covers everyday completions, explanations and edits for exactly ₹0 — and it's the only option that's fully private and offline.**

## How a local AI coding assistant fits together

Three pieces, one local port:

![Architecture of a local AI coding assistant — the Continue extension inside VS Code sends chat, edit and autocomplete requests to Ollama on localhost 11434, which runs a 7B chat model and a 1.5B autocomplete model](./local-ai-coding-assistant-architecture.svg)

VS Code is the editor you already use. **Continue** is a free, open-source extension that adds the assistant UI — a chat sidebar, an inline-edit command, and tab autocomplete. **Ollama** (from Part 1) serves models at `http://localhost:11434`. Continue sends every request there, tagged with a role: `chat`, `edit`, `apply` or `autocomplete`.

The key design decision is **two models, split by job**:

- **Autocomplete fires on almost every keystroke,** so it must respond in milliseconds. A 1.5B model trained for FIM completion is ideal — big models here feel laggy, not smart.
- **Chat and edits are manual,** so a few seconds of thinking is fine. Use the largest coder model your RAM allows.

**Bottom line: Continue is the cockpit, Ollama is the engine room, and the two-model split is what makes local autocomplete feel instant.**

## Pick your two models by RAM

Qwen's coder family dominates this niche in 2026 — [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) remains the most-pulled dedicated code model on Ollama, and [qwen3-coder](https://ollama.com/library/qwen3-coder) brings a 30B MoE that activates only ~3B parameters per token:

| Your laptop | Chat + edit model | Autocomplete model |
| --- | --- | --- |
| 8 GB RAM | qwen2.5-coder:3b | qwen2.5-coder:0.5b |
| 16 GB RAM | qwen2.5-coder:7b | qwen2.5-coder:1.5b |
| 32 GB+ RAM | qwen3-coder:30b | qwen2.5-coder:1.5b |

Both models sit loaded side by side, so budget memory for the pair — on 16 GB that's roughly 5 GB + 1 GB, which leaves comfortable headroom.

<Callout type="tip">

Pull models with their exact size tag (`qwen2.5-coder:7b`, not `qwen2.5-coder`). A bare name pulls `:latest`, which may be a different size than your config expects — the classic cause of Continue's "404 model not found" error.

</Callout>

**Bottom line: at 16 GB, `qwen2.5-coder:7b` for chat plus `qwen2.5-coder:1.5b` for autocomplete is the proven pairing.**

## Set it up in fifteen minutes

**Step 1 — pull the two models** (Ollama installed in Part 1):

```bash
ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:1.5b
```

**Step 2 — install Continue.** In VS Code, open Extensions, search "Continue", install the one by Continue Dev. A sidebar icon appears.

**Step 3 — point Continue at Ollama.** Open Continue's config (gear icon → *Open config file*) and replace it with:

```yaml
name: Local Assistant
version: 0.0.1
schema: v1
models:
  - name: Qwen2.5-Coder 7B
    provider: ollama
    model: qwen2.5-coder:7b
    roles:
      - chat
      - edit
      - apply
  - name: Qwen2.5-Coder 1.5B
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete
```

Swap the model tags for your RAM tier from the table above. That's the entire configuration.

**Step 4 — test all three features:**

- **Chat:** open the Continue sidebar, ask "explain this file" with a file open.
- **Inline edit:** select a function, press `Cmd+I` (Mac) or `Ctrl+I` (Windows/Linux), type "add error handling".
- **Autocomplete:** start typing a function body and watch gray ghost text appear; accept with `Tab`.

<Callout type="warning">

Continue's Agent mode needs a model with tool-calling support. Small coder models often advertise it but fail in practice — if you see "Agent mode is not supported", stick to Chat and Edit modes locally, or add `capabilities: [tool_use]` and test. Per [Continue's own docs](https://docs.continue.dev/guides/ollama-guide), advertised and actual tool support don't always match.

</Callout>

**Bottom line: two pulls, one extension, ten lines of YAML — then chat, edit and autocomplete are live.**

## A realistic daily workflow

Where the local setup shines day to day:

- **"What does this do?"** — select code, ask in chat. The 7B coder explains unfamiliar code well, and it's reading the actual file, not a paste.
- **Small refactors** — `Cmd+I` → "convert to async", "add type hints", "extract this into a helper". Review the diff, accept or reject.
- **Boilerplate autocomplete** — tests, argument parsing, dataclasses, API handlers. This is where the 1.5B FIM model quietly earns its keep, suggesting the next two or three lines faster than you'd type them.
- **Learning** — because it's free and unlimited, you can ask every "why" question you'd hesitate to burn paid tokens on. Pair it with the [agent-building series](/ai-agents-from-scratch-python-part-1/) and the assistant explains the code you're writing as you write it.

Where to stay realistic: long multi-file features, subtle architectural decisions, and gnarly debugging across a big codebase are still better served by frontier models. Know which tool you're holding.

## When something breaks

Four errors cover nearly every first-day problem:

- **"404 model not found, try pulling it first"** — your config names a tag you haven't pulled. Run `ollama list`, then pull the exact tag from the error. A bare `ollama pull qwen2.5-coder` grabs `:latest`, which is not the same model as `:7b`.
- **"Model requires more system memory"** — Continue defaults to a larger context window than some tools. Lower `contextLength` (try 2048) under `defaultCompletionOptions` in that model's config block, or step down one model size.
- **Autocomplete feels laggy** — the autocomplete role is running on a model that's too big. Keep it on the 1.5B; on every keystroke, speed beats brains.
- **Nothing responds at all** — check the engine: `curl http://localhost:11434` should answer "Ollama is running". If not, start Ollama and try again.

**Bottom line: almost every failure is a tag mismatch, a memory ceiling, or Ollama not running — each fixable in under a minute.**

## Quick recap

| Decision | Answer |
| --- | --- |
| The stack | VS Code + Continue extension + Ollama |
| Chat/edit model (16 GB) | qwen2.5-coder:7b |
| Autocomplete model | qwen2.5-coder:1.5b (FIM-trained, keystroke-fast) |
| 32 GB upgrade | qwen3-coder:30b for chat, same 1.5B for autocomplete |
| Configuration | ~10 lines of `config.yaml`, roles per model |
| Cost | ₹0 — no subscription, no tokens, no caps |
| Privacy | Everything stays on `localhost:11434` |

A year ago "free coding assistant" meant a crippled trial. Now it means a private, offline stack you control end to end — built from the same pieces you set up in Part 1, plus one extension and ten lines of YAML.

<NextSteps>

- **Start here:** pull the two Qwen coder models and paste the config — you'll have working autocomplete before lunch.
- **Level up:** point the [Pydantic AI tutorial](/pydantic-ai-tutorial-type-safe-agents-python/) at the same Ollama endpoint and let your new assistant help you write your first type-safe agent.
- **Coming in this series:** using your local model to chat with your own documents — fully private RAG on your laptop.

</NextSteps>