How models work
Base, instruct & reasoning
The same weights ship in three tunings, and the tuning shapes behavior more than size does. Almost always you want instruct. Here is what that word means, and the two times it isn’t the answer.
Training happens in stages. Pretraining reads a vast amount of text and produces a base model. A second round of fine-tuning turns that base into an instruct model or a reasoning model. All three share one architecture, the same parameter count, and the same decode cost. Only the behavior changes, and a model’s name usually tells you which tuning you have.
Base models
A base model predicts the next token and nothing more. Ask it a question and it doesn’t answer so much as keep going: it might write three more questions, because in its training data a question is often followed by other questions. There is no notion of a “user” or an “assistant,” and no instinct to stop when the answer is done.
That makes it wrong for chat and right for completion: filling the middle of a code file, continuing a draft, any job where you want raw continuation instead of a turn-taking reply. Base weights carry no instruct suffix, sometimes a -base tag. The catalog ships instruct builds, so you rarely meet one here.
Instruct models
An instruct model is a base model fine-tuned on instruction and response pairs, then aligned to human preferences. It follows directions, holds a conversation, respects formatting requests, stays on task, and declines what it shouldn’t do. It reads the system, user, and assistant turn structure that a chat is built from, the part a base model never learned. Use it for chat, coding help, summarizing, drafting, and tool use.
A model listed with no qualifier is instruct. That is what every lab ships as the default. The name often spells it out: -Instruct, -it on Gemma, or -Chat.
Reasoning models
A reasoning model thinks before it answers. It writes a visible run of intermediate steps, then commits to a conclusion. The behavior comes from reinforcement learning against problems with checkable answers, so the thinking earns its keep on math, logic, and hard multi-step code rather than padding the output.
You pay for that in tokens and latency. The model spends a paragraph, sometimes pages, working through the problem before the answer lands, so every reply runs slower and reads longer. Reach for it when the task rewards rigor over speed.
Reasoning, distilled
DeepSeek’s R1 distills bake this step-by-step behavior into an ordinary dense model. An R1-Distill-Qwen loads and runs like a Qwen of the same size and reasons like R1, so you get the thinking without a new architecture or a separate runtime path. The catalog marks these as reasoning specialists, not tool-callers: the thinking is the product, and structured tool calls are not what they were tuned for.
Which to pick
Default to instruct and step off it only for a reason. The name tells the tunings apart at a glance.
| Tuning | Reach for it when | The name shows |
|---|---|---|
| Instruct | almost everything: chat, code, tasks, tools | -Instruct · -it · -Chat |
| Reasoning | math, proofs, logic, hard multi-step problems | R1 · Distill · “thinking” |
| Base | autocomplete and raw completion only | no suffix · -base |
One model can be both. Qwen 3 ships a switchable thinking mode: an instruct model that reasons step by step when you turn it on and answers directly when you don’t. Flip it on for a proof, off for a quick reply, and pay the latency only when the task is worth it. From the local API the choice is just the model you name, instruct for everyday work and a distilled reasoner when a problem earns the extra tokens:
# instruct: the default for chat and tools
conifer serve --model qwen3-8b
# reasoning: step-by-step on a dense backbone
conifer serve --model deepseek-r1-distill-qwen-14b