NexGate
API reference

Chat Completions

OpenAI-compatible chat completions endpoint, including request parameters, limits, streaming, and error states.

Endpoint

POST https://api.nexgate.app/v1/chat/completions

Use this endpoint with any OpenAI-compatible SDK. NexGate validates the request, reserves credits, forwards the request to the configured provider, and finalizes the actual cost after the response completes.

Tip

Use /api/v1 as the SDK base URL. Dashboard, checkout, webhook, and utility endpoints live under /api/* without the /v1 prefix.

Authentication

Pass a NexGate API key in the Authorization header.

Authorization: Bearer ng-your-key-here

API keys start with ng-, are bcrypt-hashed at rest, and the plaintext key is only shown once when generated.

Warning

Do not expose a NexGate API key in browser code, mobile apps, public repositories, or client-side environment variables.

Base URLs

https://api.nexgate.app/v1
http://localhost:3000/api/v1

Request body

modelstringrequired

Direct model ID. Must be a non-empty string up to 128 characters. See the full model catalog.

messagesarrayrequired

Chat messages. The array must contain 1 to 100 message objects.

Message role must be one of developer, system, user, assistant, or tool. content may be a string, an array of object parts, null, or omitted.

[
  { "role": "system", "content": "You are a helpful assistant." },
  { "role": "user", "content": "Hello!" }
]
streambooleandefault:

Whether to stream the response as Server-Sent Events.

max_tokensinteger

Maximum output tokens. Must be an integer from 1 to 200000. NexGate enforces a provider-facing ceiling based on the lower of your request and the model limit.

max_completion_tokensinteger

Alternative output ceiling for GPT-5 and o-series style model families. Must be an integer from 1 to 200000.

temperaturenumberdefault: 1

Sampling temperature from 0 to 2.

top_pnumberdefault: 1

Nucleus sampling value from 0 to 1.

frequency_penaltynumberdefault: 0

Frequency penalty from -2 to 2.

presence_penaltynumberdefault: 0

Presence penalty from -2 to 2.

stopstring | string[]

Stop sequence or up to 4 stop strings. Each stop string can contain up to 500 characters.

toolsarray

Function-calling tool definitions. The array can contain up to 64 tools and the serialized tool payload can be up to 64KB.

tool_choicestring | object

Tool selection behavior, such as auto, none, or a specific function tool.

response_formatobject

Response formatting control. response_format.type must be text, json_object, or json_schema. The serialized payload can be up to 32KB.

seedinteger

Deterministic sampling seed when the selected model supports it. Must be a signed 32-bit integer.

Message field limits

FieldLimit
messages1 to 100 items
message.content stringUp to 200,000 characters
message.content arrayUp to 50 object parts
message.nameString up to 64 characters
message.tool_call_idString up to 256 characters
message.tool_callsMust be an array when provided

Note

Invalid requests are rejected before model lookup, token estimation, credit reservation, or provider forwarding.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="ng-your-key",
    base_url="https://api.nexgate.app/v1",
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "user", "content": "Write a one-sentence product tagline."}
    ],
    max_tokens=64,
    temperature=0.7,
)

print(response.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ng-your-key",
  baseURL: "https://api.nexgate.app/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [
    { role: "user", content: "Write a one-sentence product tagline." },
  ],
  max_tokens: 64,
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
curl https://api.nexgate.app/v1/chat/completions \
  -H "Authorization: Bearer ng-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "user", "content": "Write a one-sentence product tagline."}
    ],
    "max_tokens": 64,
    "temperature": 0.7
  }'

Streaming

Set stream to true to receive text/event-stream chunks.

stream = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Count to five."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")
const stream = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [{ role: "user", content: "Count to five." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"One"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15}}

data: [DONE]

Note

Streaming responses include usage in the final provider chunk when available. If the client disconnects or the stream fails before usage is available, NexGate releases the reserved credits and records the error.

Function calling

{
  "model": "gpt-5.5",
  "messages": [
    { "role": "user", "content": "What's the weather in Paris?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

JSON mode and JSON schema

{
  "model": "gpt-5.5",
  "messages": [
    { "role": "system", "content": "Return JSON only." },
    { "role": "user", "content": "List three colors." }
  ],
  "response_format": { "type": "json_object" }
}
{
  "model": "gpt-5.5",
  "messages": [
    { "role": "user", "content": "Return a project summary." }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "project_summary",
      "schema": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "risk": { "type": "string" }
        },
        "required": ["title", "risk"]
      }
    }
  }
}

Success response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1746789600,
  "model": "gpt-5.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Build faster with prepaid AI that keeps costs predictable."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 11,
    "total_tokens": 29
  }
}

Error responses

{
  "error": {
    "message": "Insufficient credits. Available: $0.0020. Estimated cost: $0.0050. Top up at https://dashboard.nexgate.app/topup",
    "type": "insufficient_credits",
    "code": "402"
  }
}
Statuserror.typeWhen
400invalid_request_errorBad JSON, non-object body, schema validation failure, unknown model, or disabled model
401authentication_errorMissing bearer token, invalid API key, or account record not found
402insufficient_creditsAvailable balance cannot cover the estimated request ceiling
403permission_errorAccount is banned or deleted
405invalid_request_errorGET /api/v1/chat/completions; only POST is accepted
413invalid_request_errorRequest body exceeds 64KB
429rate_limit_errorHourly spend safety limit is reached
502provider_errorUpstream provider request failed

Warning

Provider calls have a 120-second timeout. For streaming requests, timeout or provider errors are emitted as SSE error chunks when possible.

Credit and usage behavior

NexGate reserves credits before calling the provider using the estimated input tokens and the maximum allowed output. After the response completes, NexGate charges the actual provider-reported usage and releases the unused reservation.

Usage logs include:

  • Resolved model ID
  • Prompt and completion tokens
  • Actual cost
  • Latency
  • Success or error status

Limits

LimitValue
Request body64KB
Provider timeout120 seconds
Max messages100
Max tools64
Max tool payload64KB
Max response format payload32KB
Max concurrent requests per user3

On this page