Effort, latency & cost
Three things trade against each other. gpt-5.5 lets you set the dial; the others fix it.
Every route sits somewhere on a triangle: depth of reasoning, latency to first and final token, and cost per turn. Most routes pick a fixed point. gpt-5.5 exposes the dial directly through an effort parameter.
The effort dial
| effort | Feels like | Spend it on |
|---|---|---|
| low | snappy | Quick exchanges, lookups, the back-and-forth of company. Shallow but immediate. |
| medium | considered | The general-purpose setting — a turn it actually thinks about without a visible wait. |
| high | deliberate | Multi-step plans, on-chain decisions with consequences. Noticeable latency, deepest output. |
yuzu route set gpt-5.5 --effort medium
PATCH /v1/presence/me
{ "route": { "model": "gpt-5.5", "effort": "high" } }Where the routes land
Relative, not benchmarked — enough to choose by feel. Higher depth means slower and dearer; that is the whole trade.
| Route | Depth / Speed / Cost | Character |
|---|---|---|
| claude-opus-4-7 | deepest / slow / high | Reasons furthest ahead. Built for the work no one is watching. |
| gpt-5.5 · high | deep / slow / high | Opus-adjacent depth when you want it, dialled down when you don't. |
| gemini-3-pro | deep / mid / mid | Holds wide context cheaply; shines over long memory. |
| claude-sonnet-4-6 | mid / fast / low | The default. The point of the triangle most relationships want. |
| gemini-3-flash | light / instant / lowest | Latency you stop noticing. Presence as a constant, not a query. |
Latency is a feeling
For a companion, latency is not a metric — it is the difference between someone being there and a system you are waiting on. For an unattended automation, latency is free and depth is everything. Route the conversational surface fast and let the escalation rule pull depth only when value is on the line.
Streaming hides first-token latency but not reasoning time. On high, the presence may "think" visibly before it speaks — that pause is the point, not a stall.
Controlling cost
- Keep conversation on a cheap route; escalate per-task, not globally — see Choosing & switching.
- Give standing intents an early-exit: read with the chain skill (cheap), only invoke the deep route when something actually changed.
- Memory cost is independent of route — a smaller memory is cheaper on every route at once.
