Your first chat

From a cold app to a reply streaming across the screen, on hardware you already own.

This assumes the app is installed and a model is on disk; if not, start there.

Open the studio

Launch Conifer and a quiet screen meets you: a greeting, a model picker, and two doors marked chat and code. The picker loads nothing; RAM stays untouched until you walk through a door.

Pick a small model first; a 4B or 8B answers in a breath on most laptops. The models ledger lists size and measured decode speed, and By your hardware works backward from your RAM. Click chat.

Send the first message

Open chat and the runtime reads the weights off disk into memory. That cold start is the one wait you pay per model, not per message; after it the model is resident in RAM and answers with no reload.

Type a question and send it. The pause before the first word is the model reading your prompt; the stream after it is the model writing.

What just happened

Three phases turn a prompt into a streaming reply, in order, every turn.

Phase	What it does	How it feels
Load	Weights move from disk into memory, once per model.	A one-time wait, then gone.
Prefill	The model reads your whole prompt to produce the first token.	The pause before the first word.
Decode	The model writes the rest, one token at a time.	Text streaming in.

Decode speed is the tok/s number on the models ledger. Prefill scales with how long your prompt is; decode scales with how long the answer is.

The streaming is decode happening live. For why decode is the part worth optimizing, see how the engine works.

Nothing left the machine

No request went to a server: the prompt, the weights, the cache, and the reply all stayed in your machine’s memory. That is the local-first guarantee at work.

Keep going

Send a follow-up: the model is resident, the load wait is gone, and the cached conversation keeps long chats responsive.

Switch models from the picker; expect one cold start on the swap.
Try a different task; How to choose narrows the field in four questions.
Give a model tools. Read Agents before you wire one up, and the grant model for why nothing reaches a folder you did not hand it.

Open the studio#

Send the first message#

What just happened#

Nothing left the machine#

Keep going#

Open the studio

Send the first message

What just happened

Nothing left the machine

Keep going