Skip to content

Get your first AI response

The AI Gateway routes to Claude, GPT-4o, Llama, Mistral, DeepSeek, and more via a single endpoint with per-tenant isolation, streaming SSE, and BYOK.

Basic request

import { Neureus } from '@neureus/sdk';
const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });
const response = await client.ai.chat({
messages: [{ role: 'user', content: 'Summarize the GDPR in one paragraph.' }],
model: 'claude-sonnet-4-6',
});
console.log(response.content);
// response also contains: .toolCalls, .reasoning, .inputTokens, .outputTokens, .costUsd

Streaming

Use client.ai.stream() to get tokens as they arrive:

import { Neureus } from '@neureus/sdk';
const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });
const stream = await client.ai.stream({
messages: [{ role: 'user', content: 'Write a haiku about edge computing.' }],
model: 'gpt-4o-mini',
});
for await (const token of stream) {
process.stdout.write(token);
}

The API returns text/event-stream SSE. Each data: line is a JSON chunk:

{ "delta": { "content": "token text" }, "done": false }

Final event: { "done": true, "logId": "...", "inputTokens": 100, "outputTokens": 42 }

Choosing a model

Model IDProviderBest for
autoTask-classifiedNeureus picks the best model based on your prompt
claude-opus-4-8AnthropicComplex reasoning, long documents
claude-sonnet-4-6AnthropicBalanced quality and speed
claude-haiku-4-5AnthropicFast, cheap, simple tasks
gpt-4oOpenAIMultimodal, function calling
gpt-4o-miniOpenAICost-efficient chat
meta/llama-3.3-70bWorkers AIGeneral chat, no key required
meta/llama-3.1-8bWorkers AIFast, free, edge-deployed
deepseek/r1-32bWorkers AIReasoning tasks, no key required
qwen/coder-32bWorkers AICode generation, no key required
moonshot/kimi-k2Workers AILong-context, no key required

Workers AI models run on Cloudflare’s edge and require no provider API key.

BYOK (Bring Your Own Key)

Register your OpenAI or Anthropic key once — it overlays the global key for your tenant:

Terminal window
curl -X PUT https://app.neureus.ai/ai/providers/openai/key \
-H "Authorization: Bearer nru_your_key" \
-H "Content-Type: application/json" \
-d '{"key": "sk-your-openai-key"}'

Neureus encrypts it with AES-GCM and associates it with your tenant. All subsequent requests to OpenAI models use your key automatically.

Via SDK:

await client.ai.setProviderKey('openai', 'sk-your-openai-key');
// later, to rotate:
await client.ai.rotateProviderKey('openai', 'sk-your-new-key');
// to list configured providers:
const { providers } = await client.ai.listProviders();

Prompt pre-processing

Every chat request passes through a 4-pass pipeline automatically:

  1. Normalize — trim whitespace, collapse blank lines
  2. Structure — move system messages to index 0
  3. Trim — when >6K tokens: keep system + last 4 turns
  4. Compress (opt-in) — LLM compression via Llama 3.1 8B

To enable compression: add header x-neureus-options: compress=true.
To see preprocessing stats: add header x-neureus-debug: true.