OpenHuman Automatic Model Routing — The Smart Multi-Provider System

Different parts of an agent want different models. Long reasoning wants a frontier model. Quick "fix this typo" calls want a fast cheap one. Vision wants a vision model. OpenHuman handles this with a built-in router provider so you never have to think about it.

⚡ Why this isn't just a "model switcher"

Routing isn't a UI dropdown. The agent loop itself emits hints based on what it's about to do. You don't pick the model; the task does. That's the difference between "multi-model" and "smart routing."

🎯 How a request gets routed

The model parameter on any chat call can take one of two shapes:

Concrete model name: e.g. anthropic/claude-sonnet-4 — routes to the default provider with that exact model.
Hint prefix: e.g. hint:reasoning — looks the hint up in the route table and resolves to a (provider, model) pair.

// src/openhuman/providers/router.rs
fn resolve(&self, model: &str) -> (usize, String) {
    if let Some(hint) = model.strip_prefix("hint:") {
        if let Some((idx, resolved_model)) = self.routes.get(hint) {
            return (*idx, resolved_model.clone());
        }
    }
    (self.default_index, model.to_string())
}

The router wraps several pre-created providers (Anthropic, OpenAI, Google, Groq, etc.) and picks the right one per request. Hints can be remapped at runtime without restarting the core.

🔖 Common hints

Hint	Typical target	When it's used
hint:reasoning	Strong frontier model	Multi-step planning, math, code-heavy turns
hint:fast	Cheap/fast model	UI helpers, autocompletes, small classification calls
hint:vision	Vision-capable model	Screenshots, image attachments, OCR
hint:summarize	Compression-focused model	Memory Tree summary builders
hint:code	Code-tuned model	Native coder turns

💳 One subscription, many providers

Routing happens behind a single OpenHuman subscription. You don't hold separate API keys for Anthropic, OpenAI, Google etc. The backend brokers access to 30+ providers, and the router picks the right one per task. This is the "one subscription, many providers" guarantee from the README, made concrete.

💰 Cost example

Typical usage: 70% of queries go to a fast/cheap model, 25% to a reasoning model, 5% to vision. The weighted average can be up to 94% cheaper than routing everything through a frontier model like GPT-4o or Claude Sonnet.

🛠 Overriding routes

You can override routing behavior at different levels:

Globally (config.toml)

Supply a custom route table at startup via the Config struct:

[model_routing]
fast_model = "groq/llama-3.1-8b-instant"
reasoning_model = "anthropic/claude-sonnet-4"
vision_model = "openai/gpt-5.1"

Per call

Pass a concrete model name (no hint: prefix) and the router falls through to the default provider with that exact model.

Per skill

Skills can pin a hint or a model in their manifest so they always use the right model for their domain.

🤖 Per-agent model pins

Sub-agents can pin an exact model without disabling automatic routing for the rest of the app. Use this when an orchestrator or team lead needs a stronger model, while high-volume leaf agents should stay on a cheaper one.

Inline call (one delegation)

{
  "agent_id": "researcher",
  "model": "anthropic/claude-sonnet-4",
  "prompt": "Collect source notes for the launch memo."
}

Persistent defaults (config.toml)

[orchestrator]
model = "anthropic/claude-sonnet-4"

[teams.research]
lead_model = "openai/gpt-5.1"
agent_model = "groq/llama-3.1-8b-instant"

[teams.code]
agent_model = "qwen/qwen3-coder"

Resolution order

Inline model on spawn_subagent or delegation call
[orchestrator].model or [teams.<team>] / built-in aliases
The archetype's own model hint and the normal route table

For [teams.*], lead_model applies to agents that can delegate and agent_model applies to leaf workers. If only one is set, the harness falls back to it for both roles.

🗺 Real-world routing examples

🖼 Screenshot analysis → `hint:vision`

When the Mascot screenshots a page or a user sends an image, OpenHuman routes to a vision-capable model (e.g., GPT-4o or Gemini 2.5 Pro).

🏗 Code generation → `hint:reasoning` or `hint:code`

The native coder tool routes to a model optimized for code (e.g., Qwen3 Coder, Claude Sonnet).

💬 Chat response → `hint:fast`

Quick conversational turns use a cheap fast model like Groq Llama 3.1 or GPT-4o-mini.

🧠 Memory summary → `hint:summarize`

Memory Tree builders use a model optimized for compression and summarization.

🔗 See also

Smart Token Compression (TokenJuice) — what makes large reasoning calls affordable
Subconscious Loop — uses hint:fast for per-tick evaluation
Native Tools — different tool calls hint at different routes