Thinking Steering
LeanCTL classifies every prompt by complexity and adjusts the AI's reasoning budget accordingly. Simple tasks skip thinking entirely. Complex tasks get deep analysis.
How It Works
Every time you send a message, LeanCTL's intent classifier determines the task type (explore, generate, debug, refactor, casual) and maps it to a thinking budget. This happens transparently — the AI receives the optimal thinking parameters for each turn.
The result: you never pay for unnecessary reasoning tokens on simple tasks, but always get thorough analysis on complex ones.
Task classification
The intent classifier analyzes your prompt and assigns one of these task types:
| Task Type | Thinking Budget | Example |
|---|---|---|
| Casual | Minimal (1K tokens) | "what does this function do?" |
| Generate | Minimal (1K tokens) | "add a login form" |
| Explore | Medium (4K tokens) | "how does auth work in this project?" |
| Review | Medium (4K tokens) | "review this PR for issues" |
| Test | Trace (8K tokens) | "write comprehensive tests for auth.ts" |
| Debug / FixBug | Trace (8K tokens) | "fix the failing tests" |
| Refactor | Deep (16K tokens) | "refactor the database layer" |
| Config / Deploy | Varies | "set up the CI pipeline" |
think: false — saving maximum tokens for simple questions.
Provider-Specific Behavior
Thinking Steering adapts to each provider's API:
Anthropic (Claude)
Uses the budget_tokens parameter for extended thinking. Budget values are mapped directly from the task classification.
| Budget | Tokens |
|---|---|
| Minimal | 1,024 |
| Medium | 4,096 |
| Trace | 8,192 |
| Deep | 16,384 |
OpenAI (GPT)
Uses the reasoning_effort parameter for o-series models:
| Budget | Effort |
|---|---|
| Minimal | low |
| Medium | medium |
| Trace / Deep | high |
Ollama (local models)
Uses the think parameter (boolean). Simple tasks set think: false which can reduce output tokens by up to 95% on supported models (Qwen, DeepSeek-R1).
| Budget | Think |
|---|---|
| Minimal | false |
| Medium+ | true |
Thinking Modes
Control thinking behavior globally via config or per-session with the /thinking command:
Auto (default)
The recommended mode. LeanCTL classifies each prompt and adjusts thinking automatically. You get optimal performance without any manual tuning.
/thinking auto Always
Forces maximum thinking on every turn. Use when working on complex architectural tasks where you want the AI to reason deeply about every response.
/thinking always Never
Disables thinking entirely. Maximum speed, minimum token usage. Best for simple code generation tasks where reasoning overhead isn't needed.
/thinking never Manual
Disables automatic steering. You control thinking via the /thinking command between turns.
/thinking manual Configuration
Set the default thinking mode in ~/.leanctl/config.toml:
[thinking]
mode = "auto" # auto | always | never | manual Override per session in the TUI:
/thinking auto # switch to auto
/thinking never # disable thinking for this session Token Savings
Thinking Steering can save 30–80% on reasoning tokens depending on your workload. The savings come from:
- Suppressing thinking on simple tasks — "add a comment", "rename this variable" don't need 16K tokens of reasoning
- Right-sizing budgets — debugging gets 8K, not 16K; code generation gets 1K, not 8K
- Provider-specific optimization — Ollama's
think: falseeliminates thinking tokens entirely on simple tasks
View your thinking savings with /cost or /stats in the TUI, or leanctl stats from the command line.
Status Bar Indicator
The TUI status bar shows the current thinking state after each turn:
◇ explore | thinking: auto-steered (think: false)
◇ debug | thinking: auto-steered (budget: 8192 tokens)
◇ generate | thinking: always-on (budget: 16384 tokens) This helps you understand how thinking was applied to each request, and lets you verify that the auto-classifier is making appropriate decisions.