Overview

How models work

The local catalog is tier 0 of the router: open weights you download once and run free on your own hardware.

Every model here runs on the free local tier, router-picked or pinned by hand. The model ledger holds each model’s numbers; the architecture under them governs memory and speed.

Read a model in four numbers

parameters: Total weight count, in billions; with a quantization it fixes size. A 7B at Q4_K_M is roughly 4 GB.
active parameters: How many run per token: the total for dense, far fewer for Mixture-of-Experts. Active drives speed; total drives memory.
decode speed: Tokens per second while a reply streams; it tracks active parameters and quantization.
intelligence: A published benchmark score, MMLU on the ledger: a proxy, not a guarantee.

Three architecture classes

Every model belongs to one of three classes.

Class	Active / token	Memory	Best when
Dense	all parameters	= parameters	predictable, well-supported quality
Mixture-of-Experts	a few experts	= all experts (large)	spare RAM, speed with breadth
Hybrid & sub-quadratic	all, but cheaply	tiny KV cache	long context must stay fast

The Gemma family is dense, with a few quirks of its own.

Families and tuning

Within a class, models group by family: a lineage sharing a tokenizer, training recipe, and architecture.

Llama, Meta’s line: broadly compatible, the safe default.
Qwen: the widest range, 0.6B to 32B dense plus a 30B-A3B MoE.
Gemma, Google’s open lineage: capable for its size.
DeepSeek: reasoning-distilled Qwen and Llama variants, plus the V2-Lite MoE.
Phi, Microsoft’s data-curated small models, answering above their parameter count.
LFM2, Liquid’s hybrid line: the fastest route to long context.

The same weights also ship in different tunings. Base, instruct & reasoning covers which to pull.

What “supported” means

Conifer runs a model when the engine has a kernel for its architecture, not its name. A release reusing a supported architecture runs the day its weights publish; a new one is load-gated until a kernel ships.

Run one

Name a model and the runtime fits the quant and context window, via the local API:

terminal

conifer serve --model qwen3-8b

Engine details live in Inside the engine; to pick by task, start at Choosing a model.

Read a model in four numbers#

Three architecture classes#

Families and tuning#

What “supported” means#

Run one#

Read a model in four numbers

Three architecture classes

Families and tuning

What “supported” means

Run one