Get your first AI response
The AI Gateway routes to Claude, GPT-4o, Llama, Mistral, DeepSeek, and more via a single endpoint with per-tenant isolation, streaming SSE, and BYOK.
Basic request
import { Neureus } from '@neureus/sdk';
const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });
const response = await client.ai.chat({ messages: [{ role: 'user', content: 'Summarize the GDPR in one paragraph.' }], model: 'claude-sonnet-4-6',});
console.log(response.content);// response also contains: .toolCalls, .reasoning, .inputTokens, .outputTokens, .costUsdStreaming
Use client.ai.stream() to get tokens as they arrive:
import { Neureus } from '@neureus/sdk';
const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });
const stream = await client.ai.stream({ messages: [{ role: 'user', content: 'Write a haiku about edge computing.' }], model: 'gpt-4o-mini',});
for await (const token of stream) { process.stdout.write(token);}The API returns text/event-stream SSE. Each data: line is a JSON chunk:
{ "delta": { "content": "token text" }, "done": false }Final event: { "done": true, "logId": "...", "inputTokens": 100, "outputTokens": 42 }
Choosing a model
| Model ID | Provider | Best for |
|---|---|---|
auto | Task-classified | Neureus picks the best model based on your prompt |
claude-opus-4-8 | Anthropic | Complex reasoning, long documents |
claude-sonnet-4-6 | Anthropic | Balanced quality and speed |
claude-haiku-4-5 | Anthropic | Fast, cheap, simple tasks |
gpt-4o | OpenAI | Multimodal, function calling |
gpt-4o-mini | OpenAI | Cost-efficient chat |
meta/llama-3.3-70b | Workers AI | General chat, no key required |
meta/llama-3.1-8b | Workers AI | Fast, free, edge-deployed |
deepseek/r1-32b | Workers AI | Reasoning tasks, no key required |
qwen/coder-32b | Workers AI | Code generation, no key required |
moonshot/kimi-k2 | Workers AI | Long-context, no key required |
Workers AI models run on Cloudflare’s edge and require no provider API key.
BYOK (Bring Your Own Key)
Register your OpenAI or Anthropic key once — it overlays the global key for your tenant:
curl -X PUT https://app.neureus.ai/ai/providers/openai/key \ -H "Authorization: Bearer nru_your_key" \ -H "Content-Type: application/json" \ -d '{"key": "sk-your-openai-key"}'Neureus encrypts it with AES-GCM and associates it with your tenant. All subsequent requests to OpenAI models use your key automatically.
Via SDK:
await client.ai.setProviderKey('openai', 'sk-your-openai-key');// later, to rotate:await client.ai.rotateProviderKey('openai', 'sk-your-new-key');// to list configured providers:const { providers } = await client.ai.listProviders();Prompt pre-processing
Every chat request passes through a 4-pass pipeline automatically:
- Normalize — trim whitespace, collapse blank lines
- Structure — move system messages to index 0
- Trim — when >6K tokens: keep system + last 4 turns
- Compress (opt-in) — LLM compression via Llama 3.1 8B
To enable compression: add header x-neureus-options: compress=true.
To see preprocessing stats: add header x-neureus-debug: true.