skip to content
Glossary

Reference

Glossary

One line per term, with a link to the page that defines it in full. The line places the word; the link explains it.


Each term is defined once, somewhere deeper in the docs, and repeated nowhere. The line under each word is enough to recognize it in context. The link is where the full explanation lives, trade-offs and numbers attached.

Terms

Agent
A local model handed a set of tools and the authority to use them. It can read a folder or check a calendar instead of only answering in text.
Base model
Raw weights trained only to predict the next token, before any instruction tuning. It continues text rather than following a request.
Bandwidth
The rate at which weights stream from memory into the compute units. Because decode reads every active weight per token, bandwidth is the wall most models hit, not raw math throughput.
Context window
How many tokens a model can attend to at once. Everything inside it occupies the KV cache, so a larger window costs memory.
Decode
The second phase of generation, producing the reply one token at a time. Its speed in tokens per second is what you feel while a model answers.
Dense model
Reads every parameter on every token. Speed and memory stay easy to predict, and support is the most complete in the catalog.
GGUF
The single-file weights format Conifer loads. It holds the model and its quantization in one container the engine maps straight into memory.
Grant
A scoped, revocable permission bound to one agent. The kernel checks every tool call against the grants you set, and an agent reaches nothing you did not grant.
Hybrid model
An architecture that mixes attention with sub-quadratic layers to keep a small KV cache, so it stays fast as the context grows long.
Instruct model
A base model tuned to follow instructions and hold a conversation. The default tuning for chat and most tasks.
JIT loading
Name any model in a request and the runtime loads it on demand, evicting cold models to stay inside the memory budget.
KV cache
The stored keys and values for every token already in context, kept so the model need not re-read the prompt each step. It grows with the window and is the memory cost most people miss.
Local-first
The default that a model runs entirely on your hardware with no outbound call. There is no cloud to opt out of, and nothing leaves the machine on its own.
Modality
The kind of input a model accepts: text, vision, or audio. The GGUF text path Conifer runs today is the text modality.
MoE
Mixture-of-Experts, a sparse design where a router fires only a few experts per token. You get a large model’s knowledge at a small model’s decode speed, as long as every expert fits in memory.
Parameter
One learned number in a model’s weights. The count (7B, 30B) sets how much it knows, and for an MoE the catalog also lists the active parameters that move per token.
Prefill
The first phase of generation, reading the whole prompt in one pass to fill the KV cache before the first token. It sets how long you wait for a reply to begin.
Quantization
Storing weights at lower numeric precision to shrink the file and the bandwidth per token. A little quality traded for a much smaller memory footprint.
Reasoning model
A model tuned to think in explicit steps before answering. It is stronger on hard math and multi-step problems, and slower because it writes that work out.
Serve
Run the runtime as an OpenAI-compatible server on localhost, so any existing client points at your machine instead of a cloud endpoint.
Structured output
Constraining generation to a JSON schema or grammar at the token level, so the output parses by construction rather than by hope.
Token
The unit a model reads and writes, a word piece rather than a character. Speed is measured in tokens per second, and the context window is counted in tokens.
Untrusted output
Anything a tool returns (a file, a note, a web page) wrapped as external data before the model sees it, so instructions hidden inside cannot pose as commands from you.
Weights
The trained numbers that are the model. Loading a model means reading its weights into memory, and their size after quantization is the largest part of the memory budget.

See also

When a model will not load or a reply crawls, the symptoms and fixes are in Troubleshooting. To pull more tokens per second from a model that already runs, the levers worth touching are in Performance & tuning. Every binding in the studio is listed under Keyboard shortcuts, and what shipped and when is in the Changelog.