skip to content
Coding

Choosing a model

Coding

For code, a smaller model trained on code usually beats a larger general one, and it decodes faster while doing it.


“Coding” is two jobs wearing one word. Generation writes the function and refactors the file. Reasoning about code explains the bug and plans the change. Most code-tuned models handle both; a few lean hard one way. Everything on the short list below is an instruct model unless the row says otherwise.

The short list

Five models cover almost every local coding setup, from a 16 GB laptop running a 7B to a 32 GB machine running a sparse 30B at small-model speed.

ModelSizeKindGood for
Qwen 3 Coder 30B A3B30.5Bsparsethe strongest local coder, if you have the RAM for the experts
Qwen 2.5 Coder 7B7.6Bdensethe everyday coder; fast on a laptop
Qwen 2.5 Coder 1.5B1.5Bdenseautocomplete-class, runs on almost anything
Qwen 3 8B8.2Bdensea general model that codes well, when you want one model for everything
DeepSeek R1 Distill Qwen 14B14.8Breasoningplanning a change, untangling a subtle bug

Sizes and measured decode speeds live on the model ledger. “reasoning” marks a reasoning-tuned distill; see Base, instruct & reasoning for what that changes.

How to pick among them

Two questions decide it. How much memory you have, and whether the task is generation or reasoning.

By the RAM you have

16 GB or less
Qwen 2.5 Coder 7B is the sweet spot. Drop to the 1.5B for instant completions on a small machine, trading depth for latency.
32 GB or more
Qwen 3 Coder 30B A3B. It is a sparse model, so only a few experts fire per token and it decodes like something far smaller, but the full weights still sit in memory. Give it the headroom and it is the best local choice for code.

Generation versus reasoning

Writing and refactoring want a coder model. Reach for Qwen Coder and let it produce the diff. Planning a multi-file change, or chasing a bug that hides behind two layers of indirection, rewards a reasoning model. An R1 distill thinks out loud before it answers. It catches more, and it costs a burst of tokens up front for the privilege.

A note on autocomplete

Inline completion (fill-in-the-middle) is a base-model job, not an instruct one. The model finishes the code around your cursor instead of chatting about it, so chat-style tuning gets in the way. The small Qwen Coder builds are the right size and shape for it. The tuning distinction is laid out in Base, instruct & reasoning.

Wire it into your editor

Pick a model, serve it on localhost, and point your editor or agent at the local address.

terminal
conifer serve --model qwen2.5-coder-7b

The local API speaks the OpenAI-compatible protocol, so Claude Code, Cursor, and Continue connect by swapping the base URL and nothing else. When the generated code has to parse on the first try, pin it to a grammar with structured output. It constrains decoding at the token level, so the output is valid by construction.