skip to content
Use with Claude Code & Cursor

CLI & local API

Use with Claude Code & Cursor

Point an agent tool at localhost and it talks to a model on your own machine instead of a cloud API. The swap is one line of config.


Coding agents reach a model the same way: an HTTP endpoint that accepts OpenAI-style chat requests. Conifer stands up that endpoint locally. Change the base URL from a vendor’s domain to your loopback address and the tokens come off your own GPU. The prompt never leaves the box.

The server underneath is conifer serve, and the request format these clients speak is Chat completions. Run one server and point as many tools at it as you want.

Start the server

One command brings up an OpenAI-compatible HTTP server bound to 127.0.0.1. It listens on port 8080 and serves under /v1.

terminal
conifer serve --model qwen3-8b

A client points at http://127.0.0.1:8080/v1. The server holds to loopback unless you pass a non-loopback --host, which it reads as deliberate exposure. There is no auth in front of it, so a public bind is something you opt into, not a default. For the flags, the keep-alive idle policy, and the load-on-demand path when a request names a model that isn’t resident, see JIT multi-model loading.

Cursor

Cursor overrides the OpenAI base URL directly, and that override is the entire integration. In its model settings, set the OpenAI base URL to your running server and put any non-empty string in the API key field. Conifer ignores the key, but most clients refuse to send a request without one.

Cursor → OpenAI provider settings
FieldValue
Base URLhttp://127.0.0.1:8080/v1
API keylocal
Modelthe id you served, or any string

Cursor's cloud features (tab autocomplete, Composer indexing) still route through Cursor's own backend; only the chat model is redirected here.

The model id is advisory. A client that hardcodes gpt-4o still works: the resident model answers, and the response reports the server’s own served-model id rather than the one the client sent, so the exchange stays honest. If you ran conifer serve with several models in your registry, naming one in the request loads it on demand.

Claude Code

Claude Code speaks the Anthropic Messages API, a different wire shape from the OpenAI endpoint Conifer serves. The two sit close but don’t interchange: message roles, the streaming event format, and the tool-call envelope all differ. Claude Code can’t point at the server the way Cursor does.

A small translation proxy bridges the gap. It accepts Anthropic-shaped requests and forwards them to /v1/chat/completions. Run the proxy locally, set Claude Code’s base URL to the proxy, and the path runs Claude Code to proxy to Conifer to your GPU, model resident on your own hardware.

What carries over, what doesn’t

The server covers the parts of the chat API a coding agent leans on: streamed completions, stop sequences, tool calls parsed back into OpenAI tool_calls, and schema-constrained output enforced at decode time (see Structured output). What an agent does with those is up to the agent.

Chat and streaming
Work as-is. Tokens arrive over server-sent events, the same shape a cloud endpoint sends.
Tool use
Works when the model is tool-capable. Quality rides on the model, not the wire, so pick from Coding.
Provider-specific features
Anything wired to a vendor’s own backend (a hosted indexer, a proprietary autocomplete model) keeps hitting that backend. Redirecting the chat model leaves those where they were.

Where the data boundary sits

The redirected chat model is the part that stays local. Your prompts, your code context, and the generated tokens move between the editor and a process on your own machine. If an agent still calls a cloud service for some other feature, that traffic leaves the way it always did. For the full accounting of what crosses the line, see Data boundaries.

For models worth pointing an agent at, the speed-versus-size trade-off sits on the models ledger, and the terminal commands behind all of this are in the conifer CLI.