AI Gateway

Base URL: https://app.neureus.ai
Auth: Authorization: Bearer <api_key>

Endpoints

Method	Path	Description
`POST`	`/ai/chat`	Chat completion (sync or streaming SSE)
`GET`	`/ai/models`	List available models with pricing
`POST`	`/ai/embeddings`	Generate embeddings
`GET`	`/ai/providers`	List configured BYOK providers
`PUT`	`/ai/providers/:provider/key`	Set BYOK key for a provider
`POST`	`/ai/providers/:provider/rotate`	Rotate BYOK key
`DELETE`	`/ai/providers/:provider/key`	Remove BYOK key
`POST`	`/ai/widget/key`	Create a publishable widget key
`DELETE`	`/ai/widget/key`	Revoke a widget key

POST /ai/chat

Chat with any supported model. Returns JSON or SSE depending on stream.

Request body:

{
  messages: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  model?: string;          // default: "meta-llm" (Llama 3.1 8B); use "auto" for task routing
  stream?: boolean;        // default: false
  temperature?: number;    // 0–2
  maxTokens?: number;
  max_tokens?: number;     // OpenAI-compatible alias
  systemPrompt?: string;   // prepended as system message
  prefer_async?: boolean;  // route to batch API if eligible (50% provider discount)
}

Response (non-streaming):

{
  text: string;
  choices: Array<{ message: { role: string; content: string }; finish_reason: string }>;  // OpenAI-compat
  toolCalls?: any[];
  reasoning?: string;
  logId: string;
  inputTokens: number;
  outputTokens: number;
  usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number };  // OpenAI-compat
  cached: boolean;
  costUsd: number;
}

Response (streaming, stream: true):

Returns Content-Type: text/event-stream. Each data: event:

{ "delta": { "content": "token text" }, "done": false }

Final event: { "done": true, "logId": "...", "inputTokens": 100, "outputTokens": 42 }

Headers:

x-neureus-options: compress=true — enable prompt compression
x-neureus-debug: true — include _preprocessing stats in response
x-neureus-surface: <surface> — tag request source for analytics

Example:

import { Neureus } from '@neureus/sdk';

const client = new Neureus({ apiKey: process.env.NEUREUS_API_KEY! });
const res = await client.ai.chat({
  messages: [{ role: 'user', content: 'Explain edge computing.' }],
  model: 'claude-sonnet-4-6',
});
console.log(res.content);

GET /ai/models

List all available models with pricing, task type, and context window.

Response:

{
  "models": [
    {
      "id": "claude-sonnet-4-6",
      "name": "Claude Sonnet 4.6",
      "provider": "anthropic",
      "type": "reasoning",
      "contextWindow": 200000,
      "inputCostPer1k": 0.003,
      "outputCostPer1k": 0.015,
      "supportedFeatures": ["streaming", "tool_use"]
    }
  ]
}

Responses are cached in KV for 5 minutes.

PUT /ai/providers/:provider/key

Providers: openai, anthropic

Request body: { "key": "sk-your-provider-key" }

Response: { "provider": "openai", "configured": true }

POST /ai/providers/:provider/rotate

Re-encrypt a BYOK key under the existing DEK. Use to rotate a compromised or expired key without changing the DEK.

Request body: { "key": "sk-your-new-key" }

POST /ai/widget/key

Create a publishable widget key (pk_) scoped to a tenant and optionally to a specific agent.

Request body: { "agent_id": "agent_abc123" } (optional)

Response: { "key": "pk_...", "keyPreview": "pk_xxxx" }

Widget keys are chat-only and safe to embed in browser/mobile code.