How to choose

The router picks a model per query so you don't have to. When you choose deliberately, four questions settle it: task, tuning, modality, and hardware.

The router settles this per query: hard gates first, then the cheapest capable model. Choosing by hand matters when you want to pin a model or keep a favorite resident; either way the runtime fits the pick to your memory so decode stays fast.

The task

The task narrows the field fastest; start from the work in front of you rather than the leaderboard.

If you're doing	Reach for
Writing or reasoning about code	a code-tuned model; Coding has the short list
Math, proofs, hard multi-step problems	a reasoning tuning that thinks before it answers
Drafting, editing, and conversation	a well-rounded instruct model
Technical reading and research questions	the largest instruct model your memory fits
Contracts, citations, careful reading	a large instruct model, with citations checked by hand
Long documents or retrieval-grounded work	a model whose cache stays small as context grows
A bit of everything	an 8B instruct kept resident

The tuning

The same weights ship in different tunings, and the tuning shifts what a model is good for.

Instruct: Follows instructions and holds a conversation; the default across this section.
Reasoning: Thinks step by step before it answers; worth the extra tokens on math, wasted on a one-line question.
Base: Untuned text completion, for fill-in-the-middle jobs. Never for chat.

See Base, instruct & reasoning for when each wins.

The modality

Conifer runs the text path today; image input rides a projector file the runtime doesn’t load yet, and audio isn’t exposed.

The hardware

Capability costs memory. A dense model needs roughly its parameter count in bytes at the 4-bit default, plus room for the KV cache; a sparse model decodes at a small model’s speed but still holds every expert in memory. By your hardware works backward from your machine; the model ledger has footprints and speeds.

If you don’t want to think about it

Keep a well-rounded 8B instruct model resident and trade up only when a task asks for it. Qwen 3 8B and Llama 3.1 8B both decode comfortably on a laptop and cover most everyday work. Serve your pick over the OpenAI-compatible API and point existing tools at it.

terminal

conifer serve --model qwen3-8b

The task#

The tuning#

The modality#

The hardware#

If you don’t want to think about it#

The task

The tuning

The modality

The hardware

If you don’t want to think about it