Conifer · Models

Models

Size · speed · intelligence.

The field on one axis. Cyan marks what Conifer runs locally.

Intelligence

Artificial Analysis Intelligence Index · higher is better · Artificial Analysis · 60 models

59.9
Claude Fable 5: 59.9 index
55.7
Claude Opus 4.8: 55.7 index
54.8
GPT-5.5: 54.8 index
53.5
Claude Opus 4.7: 53.5 index
44.3
DeepSeek V4 Pro: 44.3 index
44.2
Kimi K2.6: 44.2 index
40.8
Claude Opus 4.5: 40.8 index
40
GPT-5.4 mini: 40 index
39.6
Gemini 3 Pro: 39.6 index
38.2
GPT-5.4 nano: 38.2 index
36.9
GPT-5.1 high: 36.9 index
36.4
Claude Sonnet 4.5: 36.4 index
34.7
GPT-5: 34.7 index
33.3
Grok 4: 33.3 index
32.7
Kimi K2 Thinking: 32.7 index
32
DeepSeek V3.2: 32 index
31.7
Qwen3 Max: 31.7 index
30.6
Grok 4.1 Fast: 30.6 index
28.7
GLM-4.6: 28.7 index
27.4
Grok 4 Fast: 27.4 index
25.8
Gemini 2.5 Pro: 25.8 index
23.8
gpt-oss 120B high: 23.8 index
21.8
Nova 2.0 Pro: 21.8 index
20.9
Nova 2.0 Omni: 20.9 index
20.1
DeepSeek R1 0528: 20.1 index
20.1
Gemini 2.5 Flash: 20.1 index
20.1
Qwen3.5 4B: 20.1 index, runs locally on Conifer
19.6
Qwen3 235B 2507: 19.6 index
19.2
Devstral 2: 19.2 index
18.2
Nova 2.0 Lite: 18.2 index
17.9
Qwen3 VL 32B: 17.9 index
17.7
MiniMax M1: 17.7 index
17
o1-preview: 17 index
16.5
GLM-4.5 Air: 16.5 index
15.9
Mistral Large 3: 15.9 index
15.4
DeepSeek V3 0324: 15.4 index
14.9
gpt-oss 20B: 14.9 index
14.8
GPT-4.1 mini: 14.8 index
14.4
Qwen3 30B A3B 2507: 14.4 index, runs locally on Conifer
14.3
Llama 4 Maverick: 14.3 index
14.2
Nemotron 3 Nano: 14.2 index
12.5
Mistral Medium 3: 12.5 index
11.5
Qwen3 32B: 11.5 index, runs locally on Conifer
11
R1 Distill 32B: 11 index, runs locally on Conifer
10.4
Qwen3 14B: 10.4 index, runs locally on Conifer
9.8
R1 Distill 14B: 9.8 index, runs locally on Conifer
9.4
Llama 3.3 70B: 9.4 index
8.4
Qwen3 4B: 8.4 index, runs locally on Conifer
8.3
Qwen3 8B: 8.3 index, runs locally on Conifer
7.6
Llama 3.1 8B: 7.6 index, runs locally on Conifer
7.5
Qwen2.5 32B: 7.5 index, runs locally on Conifer
7.4
Gemma 3 27B: 7.4 index
7.1
Qwen2.5 Coder 32B: 7.1 index, runs locally on Conifer
6.4
R1 Distill Llama 8B: 6.4 index, runs locally on Conifer
5.5
Gemma 3 12B: 5.5 index, runs locally on Conifer
4.9
Phi-4: 4.9 index, runs locally on Conifer
4.5
Qwen2.5 Coder 7B: 4.5 index, runs locally on Conifer
4.2
Llama 3.2 3B: 4.2 index, runs locally on Conifer
2.6
Qwen3 1.7B: 2.6 index, runs locally on Conifer
1.1
Gemma 3 4B: 1.1 index, runs locally on Conifer

frontier & open runs locally on Coniferpublished figures · snapshot Jul 2026

how to read this

The whole Artificial Analysis field on one axis, and lit in cyan: the open models Conifer runs locally.

On the Intelligence composite the cloud giants tower; flip to GPQA, AIME or coding to watch the laptop-sized models punch far above their weight.

methodology & sources

Published figures, cross-checked against Artificial Analysis and each model's own technical report (snapshot Jul 2026— refreshed automatically from the live AA leaderboard).

Intelligence is the live Artificial Analysis Intelligence Index (a multi-eval composite on AA's current scale); capability scores are reasoning / thinking mode, no external tools, and benchmark versions vary by lab.

“Runs locally” marks the open weights in Conifer's catalogue that fit a personal machine; larger open models (Kimi, DeepSeek V3/V4, gpt-oss 120B, Qwen3 235B) sit in the field.

How far behind is one consumer GPU? Six to twelve months.

AA Intelligence Index6.3 mo
MMLU-Pro7.3 mo
GPQA-Diamond7.4 mo
LM Arena Elo12.4 mo

Average lag between the frontier and the best open model that fits one consumer GPU · Epoch AI, Aug 2025 (CC-BY) · source & method

source & method

Epoch AI, “Frontier AI performance becomes accessible on consumer hardware within a year” (Somala & Emberson, Aug 2025, CC-BY): the best open model that fits a single consumer GPU matches frontier scores from 6–12 months earlier — and it is gaining (+125 vs +80 Elo per year on LM Arena).

Their bar for “fits”: full weights, 4-bit quantized, in one card's VRAM — models to ~28B on an RTX 4090, ~40B on an RTX 5090. Epoch's caveat: small open models are likelier to be benchmark-tuned, so real-world lag may run somewhat longer.

Supported31 models

model	size	tok/s	intelligence
Llama
Llama 3.2 1B	1.2B	299	49.3
Llama 3.2 3B	3.2B	118	63.4
Llama 3.1 8B	8.0B	49	69.4
Hermes 3 · Llama 3.1 8B	8.0B	~49	64.8
Qwen 3
Qwen 3 0.6B	0.6B	~290	52.8
Qwen 3 1.7B	1.7B	~165	62.6
Qwen 3 4B	4.0B	90	73.0
Qwen 3 4B Instruct 2507	4.0B	~90	·
Qwen 3 8B	8.2B	46	76.9
Qwen 3 14B	14.8B	~25	81.1
Qwen 3 32B	32.8B	~12	83.6
Qwen 3 30B A3B 2507	30.5B · 3B active	·	87.1
Qwen 3 Coder 30B A3B	30.5B · 3B active	·	·
Qwen 2.5
Qwen 2.5 0.5B	0.5B	288	47.5
Qwen 2.5 1.5B	1.5B	166	60.9
Qwen 2.5 3B	3.1B	89	65.6
Qwen 2.5 7B	7.6B	53	74.2
Qwen 2.5 14B	14.7B	~27	79.7
Qwen 2.5 Coder 1.5B	1.5B	~166	53.6
Qwen 2.5 Coder 7B	7.6B	~53	68.0
Gemma
Gemma 2 2B	2.6B	138	51.3
Gemma 2 9B	9.2B	~41	71.3
Gemma 3 4B	4.3B	66	59.6
Gemma 4 12B	11.9B	15	74.5
DeepSeek
R1 Distill Qwen 1.5B	1.8B	~166	·
R1 Distill Qwen 7B	7.6B	~53	·
R1 Distill Llama 8B	8.0B	~49	·
R1 Distill Qwen 14B	14.8B	~27	·
R1 Distill Qwen 32B	32.8B	~12	·
DeepSeek V2 Lite	15.7B	·	55.7
Phi
Phi 3.5 Mini	3.8B	~78	69.0

measurement notes

tok/s: decode, Apple M3 Max, Q4_K_M, 512-token prompt; ~ projected from a measured sibling. intelligence: MMLU 5-shot, as published. · marks not measured / not published.