Reference
Troubleshooting
A model that won’t load, decode that crawls, memory that runs out, a connector gone quiet. Four symptoms, four causes, four fixes.
Local runtime failures come down to four constraints. The weights have to fit. The cache has to fit. The GPU has to start. A connector has to be granted before it reads anything. Each section names the symptom, the cause under it, and the next thing to try. The studio writes a plain sentence on the failure card when it can read the problem; this page is for the rest, and for the why.
A model won’t load
Loading runs four steps in order: read the GGUF header, claim a GPU device, allocate the weights, size the cache. Whichever step fails stops the load, and that step points straight at the fix.
| Symptom | Likely cause | Fix |
|---|---|---|
| Load fails immediately, before any progress | The GGUF file is incomplete or damaged | Re-download it from the model ledger; an interrupted download leaves a truncated file |
| “can’t run this model yet” | The architecture or a tensor type isn’t implemented | Pick another model; see what the text path runs |
| “graphics couldn’t start the model” | The GPU backend (Metal) failed to initialize | Restart Conifer; if it persists, try a smaller model |
| Download stops partway | Not enough disk space, or a dropped connection | Free up room or check the network, then retry |
The studio rewrites these into a plain sentence and an action before you see them, never the raw engine string. The raw text waits in the console if you want it.
An unsupported architecture is not a bug to file. The engine learns model families one at a time, so a fresh release can land weeks ahead of the runtime that runs it. The overview sorts a dense transformer from a sparse MoE and a hybrid, and names which the engine handles today.
Generation is slow
A model that loaded clean and then decodes at a crawl is paying for memory it doesn’t have. The weights fit. The KV cache, sized for a long window, pushes the total past physical RAM, and unified memory parks the overflow in swap. Now every token reaches for keys and values that no longer sit in fast memory, and decode drops from tens of tokens a second to about one.
Reach for the context window first. A shorter window is a smaller cache and faster decode, paid for in how far back the model can see; trim it to what the task needs in the model’s advanced options. Why the cache, not the weights, is the variable cost lives on Context & memory. The other speed levers (batch size, offload, the numbers that move tokens per second) live on Performance & tuning.
Out of memory
Three things share unified memory while a model runs: the weights, the KV cache, and the scratch a forward pass needs. Out of memory means their sum, plus room for everything else you have open, runs past what is free. Pick the one you can hand back.
- Shrink the weights
- A lower quantization level packs each weight into fewer bits, so the same model loads smaller. Dropping two model sizes does the same thing with a heavier hand.
- Shrink the cache
- Lower the context window, or switch the KV cache to its 8-bit layout to roughly halve it. A model that rejects the 8-bit setting has a shape that can’t take it; put the cache back to full.
- Give memory back to the system
- Quit the apps holding the most memory. Near the machine’s limit, the browser and the model fight over the same few gigabytes.
Conifer sizes the default window to fit free memory, so a capped window is the runtime keeping you out of swap, not a ceiling to fight. To find weights that fit before you download, read the size line on the model ledger and start from your hardware on By your hardware.
A connector fails or reads nothing
A connector is one integration scoped to one grant, off until you turn it on per agent. Most connector trouble is a closed gate: the agent asked, but a door between it and the data was never opened.
Where the gate sits
| Symptom | Cause | Fix |
|---|---|---|
| Agent says it can’t reach a store | The per-agent grant is off | Turn the grant on for that agent; see the grant model |
| A native store (Calendar, Notes) returns nothing | macOS hasn’t granted Conifer the Privacy category | Grant Conifer once in System Settings, then retry |
| A network connector errors out | An expired or wrong token, or no connection | Re-paste the token; check the account and the network |
| The agent answers without reading the store | The request routed to a different tool | Rephrase to name the source; see the catalog |
A connector that succeeds and returns suspect text is working as designed. Tool output is wrapped as untrusted on purpose, so a calendar invite or a web page can’t slip instructions into the model. When an agent ignores an order that rode in on fetched content, that boundary is holding, not breaking.
When none of that fits
A failure with no clear class earns one clean retry, then a restart of Conifer. Anything that survives the restart, a file that won’t load, a download that keeps dropping, is pointing at the file or the disk, not the run. The raw engine text behind any rewritten sentence sits in the console, and its terms resolve against the glossary.
# Reproduce the run in the CLI to read the raw error.
conifer run "ping" # add --model <id> to pick the modelFor fixed versions and what each release changed, see the changelog. A failure that reproduces through a restart and a re-download is the kind worth reporting.