skip to content
Your first model

Getting started

Your first model

Open the catalog, read one verdict per row, and let the runtime size the model to the memory you actually have.


Finding weights is easy. The hard question is whether a given set of weights will run on the machine in front of you, and at what quality. Conifer answers that on the card, before you spend a gigabyte downloading anything. Pick a model and the runtime handles the fit. If you have not installed yet, start with Install and come back.

Open the catalog

The studio’s model browser is the catalog. It opens on a short list of vetted everyday picks: well-rounded models that hold up as a default for chat, drafting, and tool use. Below those sit the specialists, each one excellent at a single job and a poor choice for anything else. When you already know the shape of your work, the choosing guide routes you straight to a recommendation by task.

Every row carries three numbers and one verdict. The numbers are the parameter count, the quantization level, and the download size on disk. The verdict is the part you read first. Decode speed and a published intelligence score live on the models ledger, where you can compare them across the catalog at a glance.

Read the fit verdict

Conifer probes your hardware once. It reads how large a single buffer the GPU will hand out and how much memory is actually free right now, the free figure already discounting the pages the OS can reclaim. From that it derives a usable-memory ceiling and compares every model against it. Each row gets a small pill, answered before download, never color alone:

fits
Runs with comfortable memory headroom.
tight
Fits, but leaves little room to spare. It will run; long contexts may push it.
won’t fit
Larger than this machine’s usable memory. Not an error, just honest unavailability.
fit unknown
The memory ceiling couldn’t be read, so Conifer says so rather than fake a confident verdict.

Hover a verdict to see the gigabyte arithmetic behind it. The pill reads identically to a screen reader and a colorblind reader, because the meaning lives in the label, not the dot.

Download it

Click download on a row that fits and the runtime starts the job, not the page. The row swaps its verdict for a progress meter: a percentage and a rough time estimate. Switch to another screen and the download keeps going; come back and the meter is where you left it. Once the weights land, the model is ready to load.

Some larger models ship as several files that download as one unit and load together as a single model. The row flags that up front, so a multi-part download is never a surprise.

What “fits it to your memory” means

You never chose a quantization level or wrote a config file, by design. Conifer picks the precision that lands without swapping weights to disk, and the engine sizes the KV cache to the memory free at load time rather than to the model’s full window. That is how a 24B model runs on a 36 GB machine instead of dropping to a token a second against a thrashing swap file. For the trade-offs behind the precision choice, see Quantization.

A few good starting points

These are reasonable defaults by machine size, starting points rather than a ranking. The per-model speed and intelligence figures live on the models ledger, and the catalog’s own fit pill is the authority on your specific hardware.

Everyday defaults by available memory
Usable memoryReach forWhy
8 GBa 3B to 4B dense modelfast, fits with headroom, good for chat and short code
16 GBa 7B to 8B dense modelthe everyday sweet spot for quality and speed
32 GB and upa 14B+ dense model or a sparse MoEmore capability; an MoE trades RAM for a small model's speed

Numbers are guidance, not a guarantee. The catalog reads your real ceiling and tells you per model.

Whether a model is dense or a sparse MoE changes how it spends that memory. That is why the catalog and these defaults weigh architecture, not parameter count alone. With the weights downloaded, the next step is your first chat.