Coding

A smaller model trained on code usually beats a larger general one, and decodes faster doing it.

Generation writes the function and refactors the file; reasoning about code explains the bug and plans the change. Everything below is instruct unless the row says otherwise.

The short list

Model	Size	Kind	Good for
Qwen 3 Coder 30B A3B	30.5B	sparse	the strongest local coder, given RAM for the experts
Qwen 2.5 Coder 7B	7.6B	dense	the everyday coder; fast on a laptop
Qwen 2.5 Coder 1.5B	1.5B	dense	autocomplete-class, runs on almost anything
Qwen 3 8B	8.2B	dense	a general model that codes well
DeepSeek R1 Distill Qwen 14B	14.8B	reasoning	planning a change, untangling a subtle bug

Sizes and decode speeds are on the model ledger.

How to pick among them

Qwen 2.5 Coder 7B is the sweet spot at 16 GB or less; the 1.5B gives instant completions below that. With 32 GB or more, Qwen 3 Coder 30B A3B is the best local choice: it is sparse, so few experts fire per token and it decodes like something far smaller, though the full weights still sit in memory. For planning a multi-file change, a reasoning model that thinks out loud first catches more, at the cost of extra tokens.

A note on autocomplete

Inline completion (fill-in-the-middle) is a base-model job: the model finishes the code around your cursor instead of chatting about it; the small Qwen Coder builds are the right shape.

Wire it into your editor

terminal

conifer serve --model qwen2.5-coder-7b

The local API speaks the OpenAI-compatible protocol, so Claude Code, Cursor, and Continue connect by swapping the base URL. When output has to parse on the first try, pin it to a grammar with structured output.

The short list#

How to pick among them#

A note on autocomplete#

Wire it into your editor#

The short list

How to pick among them

A note on autocomplete

Wire it into your editor