Skip to content

AI Gateway

Base URL: https://app.neureus.ai
Auth: Authorization: Bearer <api_key>

Endpoints

MethodPathDescription
POST/ai/chatChat completion (sync or streaming SSE)
GET/ai/modelsList available models with pricing
POST/ai/embeddingsGenerate embeddings
GET/ai/providersList configured BYOK providers
PUT/ai/providers/:provider/keySet BYOK key for a provider
POST/ai/providers/:provider/rotateRotate BYOK key
DELETE/ai/providers/:provider/keyRemove BYOK key
POST/ai/widget/keyCreate a publishable widget key
DELETE/ai/widget/keyRevoke a widget key

POST /ai/chat

Chat with any supported model. Returns JSON or SSE depending on stream.

Request body:

{
messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
model?: string; // default: "meta-llm" (Llama 3.1 8B); use "auto" for task routing
stream?: boolean; // default: false
temperature?: number; // 0–2
maxTokens?: number;
max_tokens?: number; // OpenAI-compatible alias
systemPrompt?: string; // prepended as system message
prefer_async?: boolean; // route to batch API if eligible (50% provider discount)
}

Response (non-streaming):

{
text: string;
choices: Array<{ message: { role: string; content: string }; finish_reason: string }>; // OpenAI-compat
toolCalls?: any[];
reasoning?: string;
logId: string;
inputTokens: number;
outputTokens: number;
usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number }; // OpenAI-compat
cached: boolean;
costUsd: number;
}

Response (streaming, stream: true):

Returns Content-Type: text/event-stream. Each data: event:

{ "delta": { "content": "token text" }, "done": false }

Final event: { "done": true, "logId": "...", "inputTokens": 100, "outputTokens": 42 }

Headers:

  • x-neureus-options: compress=true — enable prompt compression
  • x-neureus-debug: true — include _preprocessing stats in response
  • x-neureus-surface: <surface> — tag request source for analytics

Example:

import { Neureus } from '@neureus/sdk';
const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });
const res = await client.ai.chat({
messages: [{ role: 'user', content: 'Explain edge computing.' }],
model: 'claude-sonnet-4-6',
});
console.log(res.content);

GET /ai/models

List all available models with pricing, task type, and context window.

Query params: ?provider=openai|anthropic|workers-ai and/or ?type=coding|reasoning|chat|... (optional)

Response:

{
"models": [
{
"id": "claude-sonnet-4-6",
"name": "Claude Sonnet 4.6",
"provider": "anthropic",
"type": "reasoning",
"contextWindow": 200000,
"inputCostPer1k": 0.003,
"outputCostPer1k": 0.015,
"supportedFeatures": ["streaming", "tool_use"]
}
]
}

Responses are cached in KV for 5 minutes.


PUT /ai/providers/:provider/key

Register a BYOK API key. Encrypted with AES-GCM per tenant.

Providers: openai, anthropic

Request body: { "key": "sk-your-provider-key" }

Response: { "provider": "openai", "configured": true }


POST /ai/providers/:provider/rotate

Re-encrypt a BYOK key under the existing DEK. Use to rotate a compromised or expired key without changing the DEK.

Request body: { "key": "sk-your-new-key" }


POST /ai/widget/key

Create a publishable widget key (pk_) scoped to a tenant and optionally to a specific agent.

Request body: { "agent_id": "agent_abc123" } (optional)

Response: { "key": "pk_...", "keyPreview": "pk_xxxx" }

Widget keys are chat-only and safe to embed in browser/mobile code.