Skip to content

Multi-provider routing

Every /ai/chat request passes through detectProvider(model) which selects the backend:

Model patternProviderAuth required
meta/*, mistral/*, qwen/*, deepseek/*, moonshot/*, etc.Workers AINone (edge-bound)
meta-llmWorkers AI (Llama 3.1 8B alias)None
gpt-*, o3, o4-miniOpenAIOPENAI_API_KEY secret or BYOK
claude-*AnthropicANTHROPIC_API_KEY secret or BYOK

Auto-routing (auto)

When model: "auto", Neureus classifies the task type and selects the best available model:

Task typePreferred model
codingqwen/coder-32bgpt-4o
reasoningdeepseek/r1-32bclaude-opus-4-8
summarizationclaude-haiku-4-5meta/llama-3.1-8b
extractiongpt-4o-minimeta/llama-3.1-8b
chatmeta/llama-3.3-70bgpt-4o-mini

Selection falls through to the next provider if the preferred one has no key configured. Workers AI models always work — no key required.

Available model IDs

Workers AI (no key required):

  • meta/llama-3.3-70b — general chat, fast
  • meta/llama-3.1-8b — smallest, cheapest
  • meta/llama-3.2-3b — ultra-fast edge inference
  • meta/llama-4-scout-17b — Llama 4, long context
  • qwen/coder-32b — code generation
  • qwen/qwq-32b — reasoning
  • deepseek/r1-32b — reasoning tasks
  • mistral/7b — multilingual
  • mistral/small-24b — balanced multilingual
  • moonshot/kimi-k2 — long-context general
  • moonshot/kimi-k2-code — long-context coding
  • google/gemma-4-27b — general purpose

OpenAI (BYOK or global key):

  • gpt-4o, gpt-4o-mini, o3, o4-mini

Anthropic (BYOK or global key):

  • claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5

meta-llm vs auto

AliasBehaviour
meta-llmHardcoded to meta/llama-3.1-8b — no classification, always the same model
autoRuns resolveAutoModel() — classifies the prompt, selects across all providers

Use auto for production smart-routing. meta-llm exists for backwards compatibility and quick testing.

Streaming

All providers return normalized OpenAI-compatible SSE. The streamLlmComplete function translates provider-specific formats transparently.