By your hardware

Memory is the ceiling on a local machine. Start with the RAM you have, and read off the models that fit and stay fast.

The other pages start from the work; this one starts from the metal. Weights take one block of memory, the KV cache another, and your open apps hold the rest; overshoot and the machine swaps, dragging decode to a token a second.

What fills the memory

At Conifer’s 4-bit default, a dense model needs roughly its parameter count in bytes: an 8B lands near 5GB, a 32B near 20GB. The sparse case catches people out: a mixture-of-experts decodes at a small model’s speed but keeps every expert resident, so Qwen 3 30B A3B activates about 3B parameters per token yet occupies roughly 18GB. The KV cache is reserved up front, sized to the window, so longer context costs memory before you type.

Where each budget lands

Starting points, not ceilings. Leave a few gigabytes of headroom for the cache and the system.

A first pick by installed memory
You have	Start with	Weights	Why this one
8 to 16 GB	Qwen 3 4B Instruct 2507	~2.5 GB	Loads fast and answers above its size.
16 to 32 GB	Qwen 3 8B	~5 GB	The everyday default: quick, steady enough to drive tools.
32 to 64 GB	Qwen 3 30B A3B 2507	~18 GB	MoE depth at small-model latency, once the experts fit.
64 GB and up	Llama 3.3 70B / Qwen 2.5 72B	~43 to 47 GB	Frontier-class dense quality.

Approximate 4-bit weights; measured decode speeds are on the model ledger.

The boundaries are soft. When a model overshoots the budget, take the smaller one that fits: an 8B at full speed beats a 32B that crawls.

The app reads the machine for you

Each model card carries a fit verdict computed against the memory Conifer reads on your machine: fits, tight, or won’t fit, leaning conservative. Your first model walks the catalog screen.

When the model is bigger than the budget

Two levers buy room: lower the max-context ceiling to shrink the cache reservation, or pick a hybrid or sub-quadratic model whose cache barely grows with length.

terminal

conifer serve --model qwen3-30b-a3b --ctx 8192

Where to go next

Follow your task from How to choose, or jump to Coding. Exact footprints and speeds are on the model ledger.

What fills the memory#

Where each budget lands#

The app reads the machine for you#

When the model is bigger than the budget#

Where to go next#

What fills the memory

Where each budget lands

The app reads the machine for you

When the model is bigger than the budget

Where to go next