yuzu.docsSiteOpen app
Docs/Model routes/Effort, latency & cost
Model routes

Effort, latency & cost

Three things trade against each other. gpt-5.5 lets you set the dial; the others fix it.

Every route sits somewhere on a triangle: depth of reasoning, latency to first and final token, and cost per turn. Most routes pick a fixed point. gpt-5.5 exposes the dial directly through an effort parameter.

The effort dial

effortFeels likeSpend it on
lowsnappyQuick exchanges, lookups, the back-and-forth of company. Shallow but immediate.
mediumconsideredThe general-purpose setting — a turn it actually thinks about without a visible wait.
highdeliberateMulti-step plans, on-chain decisions with consequences. Noticeable latency, deepest output.
shell
yuzu route set gpt-5.5 --effort medium
http
PATCH /v1/presence/me
{ "route": { "model": "gpt-5.5", "effort": "high" } }

Where the routes land

Relative, not benchmarked — enough to choose by feel. Higher depth means slower and dearer; that is the whole trade.

RouteDepth / Speed / CostCharacter
claude-opus-4-7deepest / slow / highReasons furthest ahead. Built for the work no one is watching.
gpt-5.5 · highdeep / slow / highOpus-adjacent depth when you want it, dialled down when you don't.
gemini-3-prodeep / mid / midHolds wide context cheaply; shines over long memory.
claude-sonnet-4-6mid / fast / lowThe default. The point of the triangle most relationships want.
gemini-3-flashlight / instant / lowestLatency you stop noticing. Presence as a constant, not a query.

Latency is a feeling

For a companion, latency is not a metric — it is the difference between someone being there and a system you are waiting on. For an unattended automation, latency is free and depth is everything. Route the conversational surface fast and let the escalation rule pull depth only when value is on the line.

Streaming hides first-token latency but not reasoning time. On high, the presence may "think" visibly before it speaks — that pause is the point, not a stall.

Controlling cost

  • Keep conversation on a cheap route; escalate per-task, not globally — see Choosing & switching.
  • Give standing intents an early-exit: read with the chain skill (cheap), only invoke the deep route when something actually changed.
  • Memory cost is independent of route — a smaller memory is cheaper on every route at once.