Local AI, Zero CostBeginner

Free Local AI Coding Assistant in VS Code (2026 Setup)

Set up a free local AI coding assistant in VS Code with Continue and Ollama: the right Qwen coder models for your RAM, plus chat, edits and autocomplete.

SK

Sukhveer Kaur

Published July 5, 2026

5 min read

Open in ChatGPT Open in Claude

On this page +

Why go local instead of paying for Copilot How a local AI coding assistant fits together Pick your two models by RAM Set it up in fifteen minutes A realistic daily workflow When something breaks Quick recap

🧰 New here? Set up your environment first · ~5 min

Install Python 3.11+ — confirm with python3 --version.
Create and activate a virtual environment: python3 -m venv .venv then source .venv/bin/activate (Windows: .venv\Scripts\activate). venv, pip & uv primer →
Install the packages this tutorial lists: pip install -U pip <packages>.
Put your LLM API key in a .env file and never commit it. API key + .env primer →

Full walkthrough → Environment Setup primer

Local AI, Zero Cost — Part 2. Part 1 matched your laptop’s RAM to the right open-source model. This part turns that setup into a working local AI coding assistant in VS Code.

GitHub Copilot costs money every month. Cursor costs more. Meanwhile the models that power a very usable coding assistant have become free, small enough for your laptop, and one config file away from living inside VS Code. The stack is three free pieces: VS Code, the Continue extension, and Ollama serving a Qwen coder model. No account, no trial, no token meter — and your code never leaves your machine.

This guide sets the whole thing up in about fifteen minutes: chat with your codebase, inline edits, and tab autocomplete, all running locally.

🟢 Beginner⏱️ 15 minStack: VS Code, Continue extension, Ollama, Qwen coder models — all free

✅ Before you start

Ollama installed with at least one model pulled — Part 1 covers this in ten minutes
VS Code installed, and comfort pasting a command into a terminal
8 GB+ RAM (16 GB gives you the noticeably smarter 7B chat model)

🎯 Key takeaways

One assistant needs two models. A small, fast model (Qwen2.5-Coder 1.5B) handles keystroke-speed autocomplete; a larger one (Qwen2.5-Coder 7B or Qwen3-Coder 30B) handles chat and edits. Continue wires each to a role.
The whole stack is free and private. Continue talks to Ollama at localhost:11434, so chat, edits and autocomplete keep working offline and nothing is ever uploaded.
Setup is one YAML file. Pull two models, install the Continue extension, paste a ten-line config.yaml — done.

Why go local instead of paying for Copilot#

The case mirrors the one from Part 1, sharpened for code:

Your code stays yours. Client work under NDA, proprietary repos, unreleased features — none of it leaves the laptop. There is no cloud inference endpoint to trust, audit or get breached.
Zero recurring cost. Copilot-class autocomplete and chat without a subscription. For students and self-learners, that removes the last excuse.
It works on flights and bad Wi-Fi. Autocomplete keeps firing with the network cable pulled.
Unlimited usage. No monthly premium-request caps, no throttling during a long refactoring session.

And the honest other side: cloud assistants still win on frontier-model reasoning and big agentic multi-file changes. If that’s your daily need, our Cursor vs Claude Code comparison covers the paid landscape. Many developers sensibly run both — local for the 80% of daily completions and questions, cloud for the hardest 20%.

Bottom line: a local AI coding assistant covers everyday completions, explanations and edits for exactly ₹0 — and it’s the only option that’s fully private and offline.

How a local AI coding assistant fits together#

Three pieces, one local port:

VS Code is the editor you already use. Continue is a free, open-source extension that adds the assistant UI — a chat sidebar, an inline-edit command, and tab autocomplete. Ollama (from Part 1) serves models at http://localhost:11434. Continue sends every request there, tagged with a role: chat, edit, apply or autocomplete.

The key design decision is two models, split by job:

Autocomplete fires on almost every keystroke, so it must respond in milliseconds. A 1.5B model trained for FIM completion is ideal — big models here feel laggy, not smart.
Chat and edits are manual, so a few seconds of thinking is fine. Use the largest coder model your RAM allows.

Bottom line: Continue is the cockpit, Ollama is the engine room, and the two-model split is what makes local autocomplete feel instant.

Pick your two models by RAM#

Qwen’s coder family dominates this niche in 2026 — qwen2.5-coder remains the most-pulled dedicated code model on Ollama, and qwen3-coder brings a 30B MoE that activates only ~3B parameters per token:

Your laptop	Chat + edit model	Autocomplete model
8 GB RAM	qwen2.5-coder:3b	qwen2.5-coder:0.5b
16 GB RAM	qwen2.5-coder:7b	qwen2.5-coder:1.5b
32 GB+ RAM	qwen3-coder:30b	qwen2.5-coder:1.5b

Both models sit loaded side by side, so budget memory for the pair — on 16 GB that’s roughly 5 GB + 1 GB, which leaves comfortable headroom.

💡 Tip

Pull models with their exact size tag (qwen2.5-coder:7b, not qwen2.5-coder). A bare name pulls :latest, which may be a different size than your config expects — the classic cause of Continue’s “404 model not found” error.

Bottom line: at 16 GB, qwen2.5-coder:7b for chat plus qwen2.5-coder:1.5b for autocomplete is the proven pairing.

Set it up in fifteen minutes#

Step 1 — pull the two models (Ollama installed in Part 1):

bash

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:1.5b

Step 2 — install Continue. In VS Code, open Extensions, search “Continue”, install the one by Continue Dev. A sidebar icon appears.

Step 3 — point Continue at Ollama. Open Continue’s config (gear icon → Open config file) and replace it with:

yaml

name: Local Assistant
version: 0.0.1
schema: v1
models:
  - name: Qwen2.5-Coder 7B
    provider: ollama
    model: qwen2.5-coder:7b
    roles:
      - chat
      - edit
      - apply
  - name: Qwen2.5-Coder 1.5B
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete

Swap the model tags for your RAM tier from the table above. That’s the entire configuration.

Step 4 — test all three features:

Chat: open the Continue sidebar, ask “explain this file” with a file open.
Inline edit: select a function, press Cmd+I (Mac) or Ctrl+I (Windows/Linux), type “add error handling”.
Autocomplete: start typing a function body and watch gray ghost text appear; accept with Tab.

⚠️ Warning

Continue’s Agent mode needs a model with tool-calling support. Small coder models often advertise it but fail in practice — if you see “Agent mode is not supported”, stick to Chat and Edit modes locally, or add capabilities: [tool_use] and test. Per Continue’s own docs, advertised and actual tool support don’t always match.

Bottom line: two pulls, one extension, ten lines of YAML — then chat, edit and autocomplete are live.

A realistic daily workflow#

Where the local setup shines day to day:

“What does this do?” — select code, ask in chat. The 7B coder explains unfamiliar code well, and it’s reading the actual file, not a paste.
Small refactors — Cmd+I → “convert to async”, “add type hints”, “extract this into a helper”. Review the diff, accept or reject.
Boilerplate autocomplete — tests, argument parsing, dataclasses, API handlers. This is where the 1.5B FIM model quietly earns its keep, suggesting the next two or three lines faster than you’d type them.
Learning — because it’s free and unlimited, you can ask every “why” question you’d hesitate to burn paid tokens on. Pair it with the agent-building series and the assistant explains the code you’re writing as you write it.

Where to stay realistic: long multi-file features, subtle architectural decisions, and gnarly debugging across a big codebase are still better served by frontier models. Know which tool you’re holding.

When something breaks#

Four errors cover nearly every first-day problem:

“404 model not found, try pulling it first” — your config names a tag you haven’t pulled. Run ollama list, then pull the exact tag from the error. A bare ollama pull qwen2.5-coder grabs :latest, which is not the same model as :7b.
“Model requires more system memory” — Continue defaults to a larger context window than some tools. Lower contextLength (try 2048) under defaultCompletionOptions in that model’s config block, or step down one model size.
Autocomplete feels laggy — the autocomplete role is running on a model that’s too big. Keep it on the 1.5B; on every keystroke, speed beats brains.
Nothing responds at all — check the engine: curl http://localhost:11434 should answer “Ollama is running”. If not, start Ollama and try again.

Bottom line: almost every failure is a tag mismatch, a memory ceiling, or Ollama not running — each fixable in under a minute.

Quick recap#

Decision	Answer
The stack	VS Code + Continue extension + Ollama
Chat/edit model (16 GB)	qwen2.5-coder:7b
Autocomplete model	qwen2.5-coder:1.5b (FIM-trained, keystroke-fast)
32 GB upgrade	qwen3-coder:30b for chat, same 1.5B for autocomplete
Configuration	~10 lines of `config.yaml`, roles per model
Cost	₹0 — no subscription, no tokens, no caps
Privacy	Everything stays on `localhost:11434`

A year ago “free coding assistant” meant a crippled trial. Now it means a private, offline stack you control end to end — built from the same pieces you set up in Part 1, plus one extension and ten lines of YAML.

🧭 Where to go from here

Start here: pull the two Qwen coder models and paste the config — you’ll have working autocomplete before lunch.
Level up: point the Pydantic AI tutorial at the same Ollama endpoint and let your new assistant help you write your first type-safe agent.
Coming in this series: using your local model to chat with your own documents — fully private RAG on your laptop.

Frequently asked questions

Is a local AI coding assistant really free? +

Yes. VS Code, the Continue extension, Ollama and the Qwen coder models are all free, with no trials, seats or token meters. The only costs are disk space for the models and the electricity your laptop already uses.

Why do I need two models, one for chat and one for autocomplete? +

Autocomplete fires on nearly every keystroke, so it needs a small, fast model like Qwen2.5-Coder 1.5B that responds in milliseconds. Chat and edits are triggered manually, so they can afford a larger, smarter model like Qwen2.5-Coder 7B. Continue assigns each model a role, and both run side by side under Ollama.

Does a local coding assistant work offline? +

Completely. Once the models are pulled, Continue talks only to Ollama at localhost:11434 — chat, inline edits and autocomplete all keep working with no internet connection at all.

Is it as good as GitHub Copilot? +

For autocomplete and everyday chat-about-code it is closer than most people expect, and it wins outright on privacy, offline use and cost. Copilot and Cursor still lead on large multi-file agentic edits and frontier-model reasoning. Many developers run both — local for daily work, cloud for the hardest problems.

Can I use Continue with JetBrains IDEs instead of VS Code? +

Yes. Continue ships a JetBrains extension with the same config.yaml format, so the identical Ollama setup works in IntelliJ, PyCharm and the rest of the JetBrains family.

References

#LocalLLM #ContinueDev #Ollama #VSCode #AICodingAssistant #CopilotAlternative

Share

Written by

Sukhveer KaurSoftware Developer & AI Engineer

Sukhveer is a software developer specialising in AI systems and backend engineering. She has hands-on experience designing agentic AI applications, working with large language model pipelines, autonomous agent frameworks, and cloud-native services in Java and Python. At InfoWok, she bridges the gap between cutting-edge AI research and practical implementation — helping developers understand and apply emerging technologies through clear, experience-backed writing.

Linkedin ↗

Related guides

Guide · 5 minBuild a Customer Support AI Agent in Python (2026)Sukhveer Kaur · Jul 4, 2026 Guide · 6 minOpenAI Agents SDK Tutorial: Build an Agent in Python (2026)Sukhveer Kaur · Jul 4, 2026 Guide · 7 minVector Database for RAG: When to Ditch the List (Part 4)Sukhveer Kaur · Jul 3, 2026

More by Sukhveer Kaur

Guide · 7 minBest Local LLM for Your Laptop in 2026: Free and PrivateSukhveer Kaur · Jul 5, 2026 Guide · 5 minSoftware Engineer Skills in 2026: What the Job Now ExpectsSukhveer Kaur · Jul 4, 2026 Review · 7 minOpenRouter Review (2026): One API, 300+ Models — Worth It?Sukhveer Kaur · Jul 3, 2026

Continue the series

← Part 00

Best Local LLM for Your Laptop in 2026: Free and Private

Get the next part the day it lands

One email per new part. No digest spam.