skip to content
How to choose

Choosing a model

How to choose

Running a model is the easy part. Picking one out of a hundred is the real decision, and four questions settle it: the task, the tuning, the modality, and the hardware you have.


The runtime handles the mechanical part. It fits your pick to the memory on hand, picks a quantization that lands without swap, and sizes the context window so decode stays fast. The one thing a config file can’t answer is which model is good at the work you do. Four questions below cut a hundred candidates to a short list. Skip to the last section to let the defaults decide.

The task

The task moves the most. A 7B model trained on code out-writes a much larger general model at code, then loses to it on prose. Start from the work in front of you, not the leaderboard. Each page below carries its own short list and the reasons behind every pick.

If you're doingGo to
Writing or reasoning about codeCoding
Math, proofs, hard multi-step problemsReasoning & math
Drafting, editing, and conversationWriting & advice
Technical reading and research questionsScience & research
Contracts, citations, careful readingLaw
Long documents or retrieval-grounded workLong context & RAG
A bit of everythingGeneral assistant

The tuning

The same weights ship in different tunings, and the tuning shifts what a model is good for more than a few billion parameters would. Three of them matter.

Instruct
Trained to follow instructions and hold a conversation. The default; almost every pick across this section is one.
Reasoning
Thinks step by step in the open before it answers. Worth the extra tokens on math, proofs, and tangled multi-step problems; wasted on a one-line question.
Base
Untuned text completion. Reach for it only for fill-in-the-middle and similar completion jobs, never for chat.

For when each one wins, and why a reasoning model can beat a larger instruct model on a hard problem, see Base, instruct & reasoning.

The modality

Conifer runs the text path today. Several catalog models are vision-capable upstream, but image input rides a separate projector file the runtime doesn’t load yet, and audio isn’t exposed at all. Choose for text. Text, vision & audio tracks what the GGUF path does and doesn’t carry.

The hardware

Capability costs memory, and on a local machine memory is the ceiling. A dense model needs roughly its parameter count in bytes at the 4-bit default, so an 8B lands near 5GB of weights plus room for the KV cache. A sparse model decodes at a small model’s speed because only a few experts fire per token, but it still has to hold every expert in memory: Qwen 3 30B A3B activates about 3B parameters yet occupies roughly 18GB.

The runtime won’t let a model thrash, but it can’t conjure RAM. By your hardware works backward from your machine to the models that fit and stay fast. Per-model footprints and decode speeds live on the model ledger.

If you don’t want to think about it

Keep a well-rounded 8B instruct model resident and trade up only when a task asks for it. Qwen 3 8B and Llama 3.1 8B both decode comfortably on a laptop and cover most everyday work. Either one is a baseline to judge everything else against.

A safe starting point by memory
You haveStart withWhy
8 to 16 GBQwen 3 4B Instruct 2507Loads in ~2.5 GB and punches far above its size.
16 to 32 GBQwen 3 8BThe everyday default: fast, capable, and steady enough for tool use.
32 GB and upQwen 3 30B A3B 2507MoE depth at small-model speed, once the experts fit.

Footprints and measured decode speeds are on the model ledger.

Serve the model you settle on over the OpenAI-compatible API on localhost and point your existing tools at it.

terminal
conifer serve --model qwen3-8b