Multi-provider, capability-aware, cost-optimized model routing for AI agents.
Routes tasks to the optimal LLM based on complexity classification, required capabilities, cost constraints, and latency requirements.
- Multi-provider: Anthropic, Google, OpenAI, Open-source (via OpenRouter)
- 12 models in registry with full cost/capability metadata
- Keyword-based complexity classifier (0-20 score)
- Capability filtering: require specific skills (reasoning, code_review, etc.)
- Budget-aware ranking: low/medium/high cost sensitivity
- Overkill prevention: penalizes expensive models for simple tasks
- Fallback chain: cross-provider failover
- Decision logging: JSONL output for learning from outcomes
- CLI + Library: use standalone or import into agents
# Simple routing
python3 scripts/router.py --task "Check disk space" --budget low
# → gemini-2.5-flash ($0.15/MTok)
# Complex task
python3 scripts/router.py --task "Design microservice architecture" --budget high
# → claude-opus-4-6 ($5.00/MTok)
# Require capabilities
python3 scripts/router.py --task "Review PR" --require code_review --provider anthropic
# → claude-sonnet-4-6
# JSON output for scripting
python3 scripts/router.py --task "Debug race condition" --json
# Verbose with alternatives
python3 scripts/router.py --task "Analyze logs" -v| Model | Provider | $/MTok In | Power | Tier |
|---|---|---|---|---|
| gemini-2.5-flash | $0.15 | 40 | light | |
| llama-4-maverick | Meta/Open | $0.20 | 45 | light |
| gpt-4.1-mini | OpenAI | $0.40 | 35 | light |
| deepseek-r1 | Open | $0.55 | 80 | medium |
| claude-haiku-4-5 | Anthropic | $0.80 | 30 | light |
| qwen-3-235b | Open | $0.80 | 72 | medium |
| o4-mini | OpenAI | $1.10 | 70 | medium |
| gemini-2.5-pro | $1.25 | 82 | medium | |
| gpt-4.1 | OpenAI | $2.00 | 72 | medium |
| claude-sonnet-4-6 | Anthropic | $3.00 | 75 | medium |
| claude-opus-4-6 | Anthropic | $5.00 | 98 | heavy |
| o3 | OpenAI | $10.00 | 96 | heavy |
Task → Complexity Classifier (keywords + length) → Score 0-20
→ Capability Filter (hard requirement)
→ Budget-Weighted Ranking (quality × cost × speed)
→ Overkill Penalty (cheap tasks don't get expensive models)
→ Pick best + 2 alternatives
- Light (0-2): status checks, formatting, simple Q&A
- Medium (3-8): code review, debugging, analysis, implementation
- Heavy (9+): architecture, security audit, novel problems
python3 scripts/benchmark.pyCurrent: 15/15 (100%) across light/medium/heavy task categories.
import sys
sys.path.insert(0, "scripts")
from router import pick_model, load_models, classify_complexity
models = load_models("config/models.json")
result = pick_model(models, "Design auth system", budget="high")
print(result["model"]["name"]) # claude-opus-4-6
print(result["model"]["hermes_id"]) # anthropic/claude-opus-4-6
print(result["tier"]) # heavy
print(result["score"]) # 15Edit config/models.json to:
- Add new models
- Update pricing
- Adjust power scores
- Add/remove capabilities
Works with delegate_task — route before spawning sub-agents:
result = pick_model(models, task_description, budget="medium")
# Use result["model"]["hermes_id"] as the model parameterMIT