Getting started
Your first model
Open the catalog, read one verdict per row, and let the runtime size the model to the memory you actually have.
Finding weights is easy. The hard question is whether a given set of weights will run on the machine in front of you, and at what quality. Conifer answers that on the card, before you spend a gigabyte downloading anything. Pick a model and the runtime handles the fit. If you have not installed yet, start with Install and come back.
Open the catalog
The studio’s model browser is the catalog. It opens on a short list of vetted everyday picks: well-rounded models that hold up as a default for chat, drafting, and tool use. Below those sit the specialists, each one excellent at a single job and a poor choice for anything else. When you already know the shape of your work, the choosing guide routes you straight to a recommendation by task.
Every row carries three numbers and one verdict. The numbers are the parameter count, the quantization level, and the download size on disk. The verdict is the part you read first. Decode speed and a published intelligence score live on the models ledger, where you can compare them across the catalog at a glance.
Read the fit verdict
Conifer probes your hardware once. It reads how large a single buffer the GPU will hand out and how much memory is actually free right now, the free figure already discounting the pages the OS can reclaim. From that it derives a usable-memory ceiling and compares every model against it. Each row gets a small pill, answered before download, never color alone:
- fits
- Runs with comfortable memory headroom.
- tight
- Fits, but leaves little room to spare. It will run; long contexts may push it.
- won’t fit
- Larger than this machine’s usable memory. Not an error, just honest unavailability.
- fit unknown
- The memory ceiling couldn’t be read, so Conifer says so rather than fake a confident verdict.
Hover a verdict to see the gigabyte arithmetic behind it. The pill reads identically to a screen reader and a colorblind reader, because the meaning lives in the label, not the dot.
Download it
Click download on a row that fits and the runtime starts the job, not the page. The row swaps its verdict for a progress meter: a percentage and a rough time estimate. Switch to another screen and the download keeps going; come back and the meter is where you left it. Once the weights land, the model is ready to load.
Some larger models ship as several files that download as one unit and load together as a single model. The row flags that up front, so a multi-part download is never a surprise.
What “fits it to your memory” means
You never chose a quantization level or wrote a config file, by design. Conifer picks the precision that lands without swapping weights to disk, and the engine sizes the KV cache to the memory free at load time rather than to the model’s full window. That is how a 24B model runs on a 36 GB machine instead of dropping to a token a second against a thrashing swap file. For the trade-offs behind the precision choice, see Quantization.
A few good starting points
These are reasonable defaults by machine size, starting points rather than a ranking. The per-model speed and intelligence figures live on the models ledger, and the catalog’s own fit pill is the authority on your specific hardware.
| Usable memory | Reach for | Why |
|---|---|---|
| 8 GB | a 3B to 4B dense model | fast, fits with headroom, good for chat and short code |
| 16 GB | a 7B to 8B dense model | the everyday sweet spot for quality and speed |
| 32 GB and up | a 14B+ dense model or a sparse MoE | more capability; an MoE trades RAM for a small model's speed |
Numbers are guidance, not a guarantee. The catalog reads your real ceiling and tells you per model.
Whether a model is dense or a sparse MoE changes how it spends that memory. That is why the catalog and these defaults weigh architecture, not parameter count alone. With the weights downloaded, the next step is your first chat.