Effort, latency & cost

Three things trade against each other. gpt-5.5 lets you set the dial; the others fix it.

Every route sits somewhere on a triangle: depth of reasoning, latency to first and final token, and cost per turn. Most routes pick a fixed point. gpt-5.5 exposes the dial directly through an effort parameter.

The effort dial

effort	Feels like	Spend it on
low	snappy	Quick exchanges, lookups, the back-and-forth of company. Shallow but immediate.
medium	considered	The general-purpose setting — a turn it actually thinks about without a visible wait.
high	deliberate	Multi-step plans, on-chain decisions with consequences. Noticeable latency, deepest output.

shell

yuzu route set gpt-5.5 --effort medium

http

PATCH /v1/presence/me
{ "route": { "model": "gpt-5.5", "effort": "high" } }

Where the routes land

Relative, not benchmarked — enough to choose by feel. Higher depth means slower and dearer; that is the whole trade.

Route	Depth / Speed / Cost	Character
claude-opus-4-7	deepest / slow / high	Reasons furthest ahead. Built for the work no one is watching.
gpt-5.5 · high	deep / slow / high	Opus-adjacent depth when you want it, dialled down when you don't.
gemini-3-pro	deep / mid / mid	Holds wide context cheaply; shines over long memory.
claude-sonnet-4-6	mid / fast / low	The default. The point of the triangle most relationships want.
gemini-3-flash	light / instant / lowest	Latency you stop noticing. Presence as a constant, not a query.

Latency is a feeling

For a companion, latency is not a metric — it is the difference between someone being there and a system you are waiting on. For an unattended automation, latency is free and depth is everything. Route the conversational surface fast and let the escalation rule pull depth only when value is on the line.

Streaming hides first-token latency but not reasoning time. On high, the presence may "think" visibly before it speaks — that pause is the point, not a stall.

Controlling cost

Keep conversation on a cheap route; escalate per-task, not globally — see Choosing & switching.
Give standing intents an early-exit: read with the chain skill (cheap), only invoke the deep route when something actually changed.
Memory cost is independent of route — a smaller memory is cheaper on every route at once.