Model Routing Guide¶

What it is¶

The Model Routing Guide is a technical framework for selecting the optimal Large Language Model (LLM) for a given task. It provides source-backed decision logic based on capability tiers, reasoning effort levels, and task-specific performance.

What problem it solves¶

Frontier models vary wildly in cost, latency, and reasoning depth. This guide prevents "over-engineering" (using a high-cost reasoning model for simple summaries) and "under-engineering" (using a low-latency model for complex logic), ensuring token-efficiency and cost-effectiveness across agentic workflows.

Where it fits in the stack¶

It is the Decision Layer of the AI stack, informing the Agentic Workflows and Home Admin Agent Architecture on which model to invoke for specific nodes in a computational graph.

Typical use cases¶

Multi-Model Orchestration: Routing a user query to a cheap classifier first, then to a high-reasoning model only if complex logic is detected.
Cost Optimization: Switching from GPT-4o to Claude 3.5 Haiku for high-volume extraction tasks to save 90% on API costs.
Latency-Critical Applications: Selecting GPT-5.3 Codex for real-time code completion where sub-second response is mandatory.

Strengths¶

Granular Control: Leverages new frontier features like OpenAI's "Reasoning Effort" levels.
Cost-Efficient: Explicitly identifies "low-latency, high-volume" tiers for bulk processing.
Source-Backed: Decision logic is derived from official provider documentation and real-world benchmarks.

Limitations¶

Dynamic Pricing: While it identifies tiers, specific token costs change frequently and should be verified via the Pricing Matrix.
Vibe-Dependent: Some routing decisions (like "Creative Writing") remain subjective and depend on the specific tone required.

When to use it¶

When building autonomous agents that need to manage their own compute budget.
When refactoring a monolithic LLM application into a more efficient multi-model pipeline.

When not to use it¶

For trivial, single-turn chat interfaces where cost and latency are not primary concerns.
If your application is locked into a single model (e.g., due to strict enterprise compliance).

Anthropic (Claude)¶

Anthropic models are categorized into three "tiers" of capability. Choosing the right tier depends on the complexity of the reasoning required.

Tier	Model	Best For	Decision Logic
Haiku	Claude 3.5 Haiku	Low-latency, high-volume tasks	Use for basic extraction, classification, and simple summarization where speed is critical.
Sonnet	Claude 3.5 Sonnet	General knowledge work, coding, and tool use	The default choice for most agentic workflows. High intelligence with manageable latency.
Opus	Claude 3 Opus	Complex reasoning, large-scale strategy, and high-fidelity creative work	Use when Sonnet fails on extreme logic puzzles or when maximum creative "nuance" is required.

OpenAI¶

OpenAI's latest frontier models (starting with GPT-5.4) introduce explicit Reasoning Effort levels, allowing developers to trade time/compute for accuracy.

GPT-5.4 Effort Levels¶

Level	Latency	Reasoning Depth	Recommended Use Case
None	Ultra-low	Surface-level	Rapid document parsing, simple chat responses.
Medium	Moderate	Balanced	Standard coding tasks, multi-step tool orchestration.
High	High	Deep	Complex bug fixes, architecture reviews.
X-High	Very High	Maximum	Frontier scientific reasoning, high-stakes logic verification.

GPT-5.3 Codex Transition¶

GPT-5.4 incorporates the specialized coding strengths of GPT-5.3-Codex. - Legacy Decision: If a workflow specifically requires the legacy Codex behavior (deterministic code completion), continue using GPT-5.3-Codex. - Modern Decision: For agentic coding where the model must use tools and iterate, GPT-5.4 (at Medium/High effort) is the recommended successor.

Task-Based Routing Summary¶

Task Type	Recommended Model	Effort/Tier
Simple Extraction	Claude 3.5 Haiku	N/A
Standard Coding	Claude 3.5 Sonnet	N/A
Complex Debugging	GPT-5.4	High / X-High
Creative Writing	Claude 3 Opus	N/A
Agentic Tool Use	Claude 3.5 Sonnet	N/A
Document Summarization	Claude 3.5 Haiku	N/A
Rapid Code Completion	GPT-5.3 Codex	N/A

Sources / References¶

Contribution Metadata¶

Last reviewed: 2026-06-06
Confidence: high