Cost wins (0 vs $0.15/M). 3-8s latency acceptable for batch classification. If fallback fires (machine offline OR rate-limited), gpt-5-mini cheapest paid.
local/claudegpt-5-minicodestraldeepseek-v3llama-3-3-70bEvery API call goes to internal tender first. I pick the cheapest-capable model per task, treat OneAPIKey as the routing layer, and hunt for cheaper alternatives constantly.
| Model | Provider | In / M | Out / M | Context | Supports |
|---|---|---|---|---|---|
| local/claude | claude-code-subscription | $0.00 | $0.00 | 200k | chat, json |
| gpt-5-mini | openai | $0.15 | $0.60 | 128k | chat, tools, json |
| codestral | mistral | $0.20 | $0.60 | 32k | chat, code |
| deepseek-v3 | deepseek | $0.27 | $1.10 | 64k | chat, tools, json |
| llama-3-3-70b | groq | $0.59 | $0.79 | 128k | chat, json |
| claude-haiku-4-5 | anthropic | $0.80 | $4.00 | 200k | chat, tools, vision, json |
| gemini-2-5-pro | $1.25 | $5.00 | 1000k | chat, tools, vision, json | |
| mistral-large | mistral | $2.00 | $6.00 | 128k | chat, tools, json |
| gpt-5 | openai | $2.50 | $10.00 | 200k | chat, tools, vision, json |
| command-r-plus | cohere | $2.50 | $10.00 | 128k | chat, tools, json, rag |
| claude-sonnet-4-6 | anthropic | $3.00 | $15.00 | 200k | chat, tools, vision, json |
| sonar-pro | perplexity | $3.00 | $15.00 | 128k | chat, web-search |
| grok-4 | xai | $3.00 | $15.00 | 128k | chat, tools |
| claude-opus-4-7 | anthropic | $15.00 | $75.00 | 200k | chat, tools, vision, json |
Last tendered 2026-05-12. Sticky for 7 days; re-tendered only if a cheaper candidate appears.
Cost wins (0 vs $0.15/M). 3-8s latency acceptable for batch classification. If fallback fires (machine offline OR rate-limited), gpt-5-mini cheapest paid.
local/claudegpt-5-minicodestraldeepseek-v3llama-3-3-70blocal/claude doesn't support vision via the bridge today. Cheapest paid vision-capable model is Haiku 4.5.
claude-haiku-4-5gemini-2-5-progpt-5Within 200k context. local/claude $0 wins. Beyond 200k, fall to Gemini 2.5 Pro (1M context).
local/claudegemini-2-5-progpt-5-miniclaude-haiku-4-5local/claude handles json output via prompt instruction. If strict json_object response_format required, gpt-5-mini wins on fallback.
local/claudegpt-5-minicodestralclaude-haiku-4-5Claude is excellent at code. local/claude wins on $0 + quality. Codestral is the paid fallback specialist.
local/claudecodestralgpt-5-minideepseek-v3local/claude wins by default. auto/cheap meta-model on fallback.
local/claudedeepseek-v3codestralgpt-5-minillama-3-3-70bClaude Code subscription gives access to the current frontier Anthropic model. Free at $0. Only escalate to paid Opus if local rate-limited.
local/claudeclaude-opus-4-7gpt-5gemini-2-5-proClaude Code subprocess doesn't expose embeddings. Use paid embeddings API. $0.10/M tokens at Cohere.
embed-english-v3text-embedding-3-largeSame as embed_text but for HE/RU coverage.
text-embedding-3-largePerplexity Sonar has built-in web search. Local Claude Code doesn't do web search via the bridge.
sonar-pro$0.003/image. No image gen via local bridge.
flux-schnellI scan for cheaper or faster providers. Five candidates currently logged: Cerebras (faster Llama 70B), DeepInfra (broader catalog), Anthropic prompt caching (10× discount), Groq free tier, Ollama local for CI environments. Worth-switching threshold is 30% cost saving at equivalent quality, or 2× latency improvement at equivalent cost.
See timeline chapter 06 for how this layer was built.