Reasoning
Control how hard a model thinks before answering
Reasoning
Some models support a thinking (reasoning) mode: before producing the final answer, the model spends extra effort working through the problem. This improves quality on hard tasks — multi-step logic, math, planning, code — at the cost of higher latency and more tokens.
Reasoning is available only on reasoning-capable models; on other models the setting has no effect.
Controlling effort
You control reasoning with a single neutral setting — enable it and give it a token budget. A larger budget means deeper thinking:
{
"thinking": {
"enabled": true,
"budgetTokens": 8000
}
}The budget is translated to each model family's native control automatically:
- For models with a direct thinking budget, it is used as-is.
- For models that expose discrete effort levels, the budget maps to a level (a small budget → light effort, a large budget → maximum effort).
One setting works across all models — no per-model configuration required.
When to use it
- Use a larger budget for complex reasoning: math, multi-constraint planning, debugging, careful classification.
- Keep it off (or small) for short, factual, or latency-sensitive replies — reasoning adds delay and cost.
For the exact call in your language, see the JavaScript, Unity, or Unreal text generation guide.