Announcement

Sage: one app for local AI.

Sage is the Conifer desktop app — the engine, the tools, and the framework in a single download. Open it, choose a model, and start a conversation. Everything runs on the machine in front of you.

Download Conifer or browse the model catalog

Running capable AI on your own hardware has meant assembling parts: a runtime, a model, the right quantization, extensions, and a separate tool for each workflow. Sage removes that assembly. It is a single application that brings the inference engine, the tools, and the framework integration together, with more than 30 open-weight models behind one picker.

The model layer stays transparent and stays out of the way. Sage handles download, storage, quantization, memory fit, and hardware-aware execution, so the work of getting a model to run well on your machine is already done. Open the app, pick a model, and start typing.

Why this matters

Local AI is no longer limited by model capability alone. Open-weight models are good enough for most everyday work. What remained was friction — accessibility, trust, and setup overhead. Sage is built to remove that friction, so a local model becomes a tool you reach for rather than a system you configure.

Generating locally on an Apple M3 Max — no network, no API key.

What’s included

Sage is one application, not a kit to assemble:

The inference engine, with kernels tuned per architecture and per chip.
The tools a model needs to be useful — files, search, and more — behind explicit, deny-by-default grants.
Framework integration, so the app, the CLI, and a local OpenAI-compatible API share one runtime.
A model picker spanning more than 30 open-weight models, from 0.5B to the largest open releases.
A chat surface you can open and use right away.

Benchmarks

Simple to use does not mean unmeasured. The engine is benchmarked head-to-head against the field on the same hardware, with the same models and workloads, and nothing leaving the device.

On an Apple M3 Max with the Metal backend, decode runs close to the memory-bandwidth wall — the limit that ultimately caps local token generation — reaching up to about 89% of the chip’s theoretical bandwidth on the 7–8B models. Against llama.cpp on identical Q4_K_M weights, Conifer leads on decode across every model tested and sits at parity on prefill. MLX, Apple’s own framework, still leads on decode for several models; Conifer is ahead on prefill for many.

Decode throughput (tokens/sec) — higher is better.
Model	Conifer	llama.cpp
LFM2-350M	764	506
Llama-3.2-1B	270	206
Qwen2.5-7B	57	53
Llama-3.1-8B	55	51
Gemma-3-12B	26	24

Apple M3 Max (36 GB), Metal backend, Q4_K_M weights, 512-token prompt and 128-token decode, best of three runs. Per-model figures live in the model ledger.

A few highlights from the same run:

Fastest decode: 764 tok/s on LFM2-350M.
Lowest energy: 0.06 joules per token on LFM2-350M.
Best efficiency: about 16 tokens per second per watt on LFM2-350M.
Local marginal cost: from $0.0031 per million tokens — electricity, not an API bill — with your data staying on-device.

On Windows with NVIDIA, the CUDA backend reaches parity with llama.cpp on decode; CUDA prefill and the Vulkan backend are still being optimized.

Why local

Cloud AI is increasingly priced and scaled for the frontier, which is more than most everyday work needs. Open-weight models now cover that work. You should not have to weigh cloud against local for every task. Sage defaults to on-device execution — fast, private, and predictable — and keeps your prompts, documents, and outputs on the machine in front of you.

Sage: one app for local AI.

Why this matters

What’s included

Benchmarks

Why local

What’s next

Quick start

Why this matters#

What’s included#

Benchmarks#

Why local#

What’s next#

Quick start#

Why this matters

What’s included

Benchmarks

Why local

What’s next

Quick start