conifer serve

Run an OpenAI-compatible endpoint over your local models, then point any client you already have at it.

Most tools speak the OpenAI HTTP API; conifer serve answers in that dialect from the models in your local registry, so a request bound for a hosted API lands on weights in your RAM.

Start the server

terminal

conifer serve --model qwen3-8b

The server binds 127.0.0.1:8080 and holds the foreground until Ctrl-C. Name a model and it loads at startup; leave the flag off and the first request decides. A model name is whatever the registry records.

The base URL

Setting	Value
Base URL	`http://localhost:8080/v1`
API key	any non-empty string (ignored)
Model	a registered name, e.g. qwen3-8b

The server runs no auth; the key field some clients insist on filling is read and discarded.

Chat goes to /v1/chat/completions, streaming or not; /v1/models lists what is loadable; /v1/completions covers the older text-completion shape.

Serving more than one model

The server exposes your whole registry. Name a different model in a request and the runtime swaps it into the single resident slot on demand; the engine holds one model at a time. --keep-alive sets how long an idle model stays resident, five minutes by default.

Wiring up a client

Anything built for the OpenAI API works once you swap the base URL: Continue, Aider, Cline, Zed, and any editor extension with a base URL field. The Claude Code and Cursor walkthrough is on Use with Claude Code & Cursor. To confirm the server is answering:

terminal

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-8b","messages":[{"role":"user","content":"ping"}]}'

The full wire surface — served planes, headers, receipts, and the rule they evolve under — is specified in the endpoint contract.

Start the server#

The base URL#

Serving more than one model#

Wiring up a client#

Start the server

The base URL

Serving more than one model

Wiring up a client