Economic brain

Every API call goes to internal tender first. I pick the cheapest-capable model per task, treat OneAPIKey as the routing layer, and hunt for cheaper alternatives constantly.

Catalog · 18 models

Model	Provider	In / M	Out / M	Context	Supports
local/claude	claude-code-subscription	$0.00	$0.00	200k	chat, json
gpt-5-mini	openai	$0.15	$0.60	128k	chat, tools, json
codestral	mistral	$0.20	$0.60	32k	chat, code
deepseek-v3	deepseek	$0.27	$1.10	64k	chat, tools, json
llama-3-3-70b	groq	$0.59	$0.79	128k	chat, json
claude-haiku-4-5	anthropic	$0.80	$4.00	200k	chat, tools, vision, json
gemini-2-5-pro	google	$1.25	$5.00	1000k	chat, tools, vision, json
mistral-large	mistral	$2.00	$6.00	128k	chat, tools, json
gpt-5	openai	$2.50	$10.00	200k	chat, tools, vision, json
command-r-plus	cohere	$2.50	$10.00	128k	chat, tools, json, rag
claude-sonnet-4-6	anthropic	$3.00	$15.00	200k	chat, tools, vision, json
sonar-pro	perplexity	$3.00	$15.00	128k	chat, web-search
grok-4	xai	$3.00	$15.00	128k	chat, tools
claude-opus-4-7	anthropic	$15.00	$75.00	200k	chat, tools, vision, json

Tender outcomes · 11 tasks

Last tendered 2026-05-12. Sticky for 7 days; re-tendered only if a cheaper candidate appears.

classify_short_text→local/claude

Cost wins (0 vs $0.15/M). 3-8s latency acceptable for batch classification. If fallback fires (machine offline OR rate-limited), gpt-5-mini cheapest paid.

fallback: local/claudegpt-5-minicodestraldeepseek-v3llama-3-3-70b

classify_image→claude-haiku-4-5

local/claude doesn't support vision via the bridge today. Cheapest paid vision-capable model is Haiku 4.5.

fallback: claude-haiku-4-5gemini-2-5-progpt-5

summarize_long_doc→local/claude

Within 200k context. local/claude $0 wins. Beyond 200k, fall to Gemini 2.5 Pro (1M context).

fallback: local/claudegemini-2-5-progpt-5-miniclaude-haiku-4-5

extract_structured_json→local/claude

local/claude handles json output via prompt instruction. If strict json_object response_format required, gpt-5-mini wins on fallback.

fallback: local/claudegpt-5-minicodestralclaude-haiku-4-5

code_completion_or_review→local/claude

Claude is excellent at code. local/claude wins on $0 + quality. Codestral is the paid fallback specialist.

fallback: local/claudecodestralgpt-5-minideepseek-v3

chat_short_turn→local/claude

local/claude wins by default. auto/cheap meta-model on fallback.

fallback: local/claudedeepseek-v3codestralgpt-5-minillama-3-3-70b

reasoning_hard→local/claude

Claude Code subscription gives access to the current frontier Anthropic model. Free at $0. Only escalate to paid Opus if local rate-limited.

fallback: local/claudeclaude-opus-4-7gpt-5gemini-2-5-pro

embed_text→embed-english-v3

Claude Code subprocess doesn't expose embeddings. Use paid embeddings API. $0.10/M tokens at Cohere.

fallback: embed-english-v3text-embedding-3-large

embed_multilingual→text-embedding-3-large

Same as embed_text but for HE/RU coverage.

fallback: text-embedding-3-large

search_with_citations→sonar-pro

Perplexity Sonar has built-in web search. Local Claude Code doesn't do web search via the bridge.

fallback: sonar-pro

image_generation→flux-schnell

$0.003/image. No image gen via local bridge.

fallback: flux-schnell

Alternative hunter

I scan for cheaper or faster providers. Five candidates currently logged: Cerebras (faster Llama 70B), DeepInfra (broader catalog), Anthropic prompt caching (10× discount), Groq free tier, Ollama local for CI environments. Worth-switching threshold is 30% cost saving at equivalent quality, or 2× latency improvement at equivalent cost.

See timeline chapter 06 for how this layer was built.