What is Conifer?

Conifer is the runtime layer between a model and the metal. It runs open-weight language models on hardware you own, and it does the memory and scheduling work that makes that practical.

Downloading a set of weights takes a minute. Running them well on a machine you actually own is the hard part. You quantize the model to fit the memory you have, size the context window to what is free, split the work across CPU, GPU, and unified memory, and keep a large model off the swap that drops decode to a crawl. That work is the product. The model is the easy half. Conifer is everything around it.

Who it’s for

Conifer is for people who run inference on their own hardware and want it fast, private, and under their control. A developer wiring local inference into a tool. A private assistant that never phones home. A team that needs on-device AI with real governance. If you want a capable model without renting a GPU or shipping prompts to a third party, this is the runtime.

The one idea

A local model lives inside a fixed memory budget, and the fit is the whole problem. Get it wrong and the model either refuses to load or thrashes against swap. Get it right and a 24B model runs smoothly on a laptop. Conifer treats that fit as the job.

The runtime probes your hardware on launch, picks a quantization that lands without spilling to disk, sizes the KV cache to the memory actually free, and schedules the rest. Name a model and the runtime makes it fit. No quant math, no config file, no guessing which build your RAM can hold.

The surfaces

One runtime sits underneath every way you reach it. The fitting and scheduling are identical across all four surfaces; only the entry point changes.

Surface	What it gives you
The studio	A desktop app to chat with a model, browse the catalog, and give an agent tools.
CLI & local API	Drive it from a terminal, or stand up an OpenAI-compatible server on localhost so existing clients work unchanged. See conifer serve.
Agents & grants	Let a local model touch files, calendar, or notes under an explicit, deny-by-default grant model.
Fleet & governance	Run it across a team with policy and audit, and no collection server. See deploy across a team.

Local by default

Nothing leaves the machine unless you wire it to. There is no cloud to opt out of and no telemetry running by default. The weights, the prompts, and the tokens the model generates stay on your hardware. That is the starting state, not a setting you have to find.

The local API in one line

The same runtime answers an OpenAI-compatible endpoint, so the tools you already use point at your own machine instead:

terminal

conifer serve --model qwen3-8b

That binds an inference server to localhost:8080. Point a client at it and the request never leaves the box. For the full threat boundary, read the local-first guarantee.

Who it’s for#

The one idea#

The surfaces#

Local by default#

The local API in one line#

Who it’s for

The one idea

The surfaces

Local by default

The local API in one line