What is Conifer?

One interface, one account, one bill, every model. A router sends each query to the cheapest model that can do the job, starting with the free ones on your own hardware.

AI work is scattered across separate apps, subscriptions, and keys, every query billed at frontier prices. Conifer replaces that with one interface and a router that decides per query where inference runs.

To try it, install the app, load a model, and run a chat, or stand up a local API with one command:

terminal

conifer serve --model qwen3-8b

The router

The rule is cheapest-capable, behind two gates. The privacy gate excludes cloud lanes for work that may not leave the machine; the capability gate excludes lanes that cannot do the job. Whatever passes both competes on cost.

Three tiers

The survivors enter a cascade: the router starts cheap and escalates only when the answer is not good enough.

Tier	Where it runs	When it is used
Tier 0 · local	Your own hardware, through the Conifer engine. Free.	The everyday majority of queries, roughly 80%.
Tier 1 · efficient cloud	Cloud lanes serving strong mid-tier models.	When the local answer is not good enough for the task.
Tier 2 · frontier	Frontier models in the cloud.	The hardest queries only.

Cloud lanes exist only if you allow the cloud at all. Turn it off and the cascade ends at tier 0.

The receipt

Every answer carries a receipt: which model ran, in which lane, and why. You can audit the routing instead of taking it on trust.

Privacy-first mode

One setting pins every query to the machine; the router still picks the best-suited local model per task. Teams deploy the same policy fleet-wide, without a collection server.

The free foundation

Tier 0 runs open-weight models through a from-scratch engine with hand-tuned Metal kernels, measured faster than llama.cpp and at per-byte parity with Apple’s MLX.

A local model lives inside a fixed memory budget. The runtime probes your hardware, picks a quantization that lands without spilling to disk, and sizes the context window to the memory actually free.

The sections

Each section is short on purpose: enough to navigate Conifer, not a course.

Section	What’s in it
Getting started	Install, your first model, your first chat.
Choosing a model	Which model for your task, and what fits your hardware.
How models work	How to read the catalog: instruct vs. reasoning and quantization.
Inside the engine	Why the free tier is fast, in one page.
CLI & local API	Drive the local tier from the terminal and serve an OpenAI-compatible endpoint on localhost.
Agents, tools, grants	Give a model files, calendar, and notes under a deny-by-default grant model.
Security & governance	Privacy-first mode, and fleet deployment without a collection server.

The router#

Three tiers#

The receipt#

Privacy-first mode#

The free foundation#

The sections#

The router

Three tiers

The receipt

Privacy-first mode

The free foundation

The sections