Law

Legal work is long inputs read carefully, claims tied back to a source, and material that should never leave your machine. The model choice covers the first two. The runtime already handles the third.

Three things make legal text its own problem. The documents are long, so the window has to hold a whole contract or a stack of filings at once. The reading has to be exact: a missed clause or a flipped qualifier changes the answer. And the privacy stakes are high enough that “send it to an API” is off the table before the first prompt. A local model on Conifer covers all three. Only the first two turn on which model you load.

Fit the window to the document

The context window is the hard constraint. A model reasons over what fits in its window and nothing more, so the moment a brief spills past that limit you are summarizing or chunking instead of reading. For legal work the model’s native context is the number that matters, ahead of a point or two of benchmark intelligence.

Local models with the headroom for long documents
Model	Native context	Kind	Good for
Qwen 3 30B A3B 2507	256K	sparse	the deepest reading that still runs at small-model speed, on 32 GB+
Gemma 3 27B	128K	dense	careful prose and analysis when memory allows
Gemma 3 12B	128K	dense	an everyday long-document model with headroom to spare
Mistral Small 3.1 24B	128K	dense	a capable generalist with dependable tool use
Llama 3.1 8B	128K	dense	the light, long-context workhorse on a laptop

Sizes, measured decode speeds, and intelligence scores are on the model ledger. A Gemma-family model carries its own quirks; see The Gemma family before relying on it.

The sparse Qwen 3 30B-A3B-2507 is the standout for this work. It has a 256K native window, and because only a few experts fire per token it reads a long document at roughly the speed of a small model. The catch is memory. A sparse model still loads every expert, so plan for the full weights in RAM. The memory budget, and why a long window costs memory through the KV cache, is in Context & memory.

Careful reading over raw intelligence

A bigger model is not a better reader. Legal work rewards a model that follows instructions precisely, holds the whole document in view, and does not paraphrase a clause into something subtly different. For questions that turn on a chain of conditions (an indemnity that depends on a notice that depends on a deadline) a reasoning model that works step by step catches what a fast first-pass answer misses. You pay for it in extra tokens before the reply lands.

Match the tuning to the question. Most drafting, summarizing, and clause-by-clause review is an instruct job. Save the reasoning models for the questions where the logic itself is the work.

Citation discipline

The failure mode that matters here is a confident citation to authority that does not exist. No local model fixes that on its own, and a small one is more prone to it. The fix is procedural, not a model setting. Keep the source in front of the model and make it point back to that text instead of to its training memory.

Ground every claim in the document. Ask for the quoted passage and its location alongside each answer, so a wrong reading is one you can catch by checking the quote.
Retrieve, don’t recall. For anything beyond the open document (a corpus of contracts, a deposition set) pull the relevant text into the window through retrieval instead of trusting the model’s memory. The patterns are in Long context & RAG.
Verify, always. Treat any case name, statute, or date the model produces as a lead to check, never as a finding.

Why local is the point here

Privileged material, sealed filings, and client data carry duties a cloud API turns into a liability: the moment a document leaves your hardware, you inherit someone else’s retention, logging, and breach posture. Conifer runs the model entirely on your machine. No request reaches a server, no copy lands in a vendor’s logs, and there is no cloud to opt out of because none was ever in the path. The detail of that boundary is in The local-first guarantee.

The same boundary holds when a model reads your files directly. Conifer’s agents are deny-by-default: a model reaches a folder or an account only through a grant you set, and nothing it reads leaves the machine. That is the right shape for a document set you cannot put online. See the grant model. And a contract can carry instructions aimed at the model, so tool output is held as untrusted.

Tip

To wire a long-context model into an existing tool, serve it on localhost and point the client at it. The local API speaks the OpenAI format, so the swap is one base URL.

terminal

conifer serve --model qwen3-30b-a3b-2507

Fit the window to the document#

Careful reading over raw intelligence#

Citation discipline#

Why local is the point here#

Fit the window to the document

Careful reading over raw intelligence

Citation discipline

Why local is the point here