Get your first AI response

The AI Gateway routes to Claude, GPT-4o, Llama, Mistral, DeepSeek, and more via a single endpoint with per-tenant isolation, streaming SSE, and BYOK.

Basic request

import { Neureus } from '@neureus/sdk';

const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });

const response = await client.ai.chat({
  messages: [{ role: 'user', content: 'Summarize the GDPR in one paragraph.' }],
  model: 'claude-sonnet-4-6',
});

console.log(response.content);
// response also contains: .toolCalls, .reasoning, .inputTokens, .outputTokens, .costUsd

Streaming

Use client.ai.stream() to get tokens as they arrive:

import { Neureus } from '@neureus/sdk';

const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });

const stream = await client.ai.stream({
  messages: [{ role: 'user', content: 'Write a haiku about edge computing.' }],
  model: 'gpt-4o-mini',
});

for await (const token of stream) {
  process.stdout.write(token);
}

The API returns text/event-stream SSE. Each data: line is a JSON chunk:

{ "delta": { "content": "token text" }, "done": false }

Final event: { "done": true, "logId": "...", "inputTokens": 100, "outputTokens": 42 }

Choosing a model

Model ID	Provider	Best for
`auto`	Task-classified	Neureus picks the best model based on your prompt
`claude-opus-4-8`	Anthropic	Complex reasoning, long documents
`claude-sonnet-4-6`	Anthropic	Balanced quality and speed
`claude-haiku-4-5`	Anthropic	Fast, cheap, simple tasks
`gpt-4o`	OpenAI	Multimodal, function calling
`gpt-4o-mini`	OpenAI	Cost-efficient chat
`meta/llama-3.3-70b`	Workers AI	General chat, no key required
`meta/llama-3.1-8b`	Workers AI	Fast, free, edge-deployed
`deepseek/r1-32b`	Workers AI	Reasoning tasks, no key required
`qwen/coder-32b`	Workers AI	Code generation, no key required
`moonshot/kimi-k2`	Workers AI	Long-context, no key required

Workers AI models run on Cloudflare’s edge and require no provider API key.

BYOK (Bring Your Own Key)

curl -X PUT https://app.neureus.ai/ai/providers/openai/key \
  -H "Authorization: Bearer nru_your_key" \
  -H "Content-Type: application/json" \
  -d '{"key": "sk-your-openai-key"}'

Neureus encrypts it with AES-GCM and associates it with your tenant. All subsequent requests to OpenAI models use your key automatically.

Via SDK:

await client.ai.setProviderKey('openai', 'sk-your-openai-key');
// later, to rotate:
await client.ai.rotateProviderKey('openai', 'sk-your-new-key');
// to list configured providers:
const { providers } = await client.ai.listProviders();

Prompt pre-processing

Every chat request passes through a 4-pass pipeline automatically:

Normalize — trim whitespace, collapse blank lines
Structure — move system messages to index 0
Trim — when >6K tokens: keep system + last 4 turns
Compress (opt-in) — LLM compression via Llama 3.1 8B

To enable compression: add header x-neureus-options: compress=true.
To see preprocessing stats: add header x-neureus-debug: true.