How does the $5 free credit offer work?

New accounts automatically receive $5 in free credits upon sign-up — no credit card is required. You can use these credits to test all frontier models immediately.

How do I integrate NexGate into my application?

NexGate is fully compatible with the OpenAI SDK. Simply update your baseURL to 'https://api.nexgate.app/v1' and replace the API key with your NexGate key. Your existing code will work without any other changes.

Which AI models can I route requests to?

NexGate provides access to leading frontier models through a single endpoint, including GPT-5.5, DeepSeek-V4 Pro, Kimi K2.6, Grok 4.3, and Llama 4. See the full catalog on the Models page.

What are granular spend controls?

You can configure hourly, daily, or absolute spending limits on individual API keys to prevent unexpected runaway API bills from loops or testing errors.

Do my prepaid credits expire?

No. All credits purchased on NexGate roll over indefinitely and never expire. You only pay for what you consume with zero recurring monthly subscriptions.

How is token usage calculated?

NexGate calculates token usage dynamically based on the exact provider token pricing (input and output) times a minimal transparent margin. Per-request pricing is available instantly in your usage log.

Chat Completions | NexGate Docs

OpenAI-compatible chat completions endpoint, including request parameters, limits, streaming, and error states.

Endpoint

POST https://api.nexgate.app/v1/chat/completions

Use this endpoint with any OpenAI-compatible SDK. NexGate validates the request, reserves credits, forwards the request to the configured provider, and finalizes the actual cost after the response completes.

Tip

Use /api/v1 as the SDK base URL. Dashboard, checkout, webhook, and utility endpoints live under /api/* without the /v1 prefix.

Authentication

Pass a NexGate API key in the Authorization header.

Authorization: Bearer ng-your-key-here

API keys start with ng-, are bcrypt-hashed at rest, and the plaintext key is only shown once when generated.

Warning

Do not expose a NexGate API key in browser code, mobile apps, public repositories, or client-side environment variables.

Base URLs

https://api.nexgate.app/v1

http://localhost:3000/api/v1

Request body

modelstringrequired

Direct model ID. Must be a non-empty string up to 128 characters. See the full model catalog.

messagesarrayrequired

Chat messages. The array must contain 1 to 100 message objects.

Message role must be one of developer, system, user, assistant, or tool. content may be a string, an array of object parts, null, or omitted.

[
  { "role": "system", "content": "You are a helpful assistant." },
  { "role": "user", "content": "Hello!" }
]

streambooleandefault:

Whether to stream the response as Server-Sent Events.

max_tokensinteger

Maximum output tokens. Must be an integer from 1 to 200000. NexGate enforces a provider-facing ceiling based on the lower of your request and the model limit.

max_completion_tokensinteger

Alternative output ceiling for GPT-5 and o-series style model families. Must be an integer from 1 to 200000.

temperaturenumberdefault: 1

Sampling temperature from 0 to 2.

top_pnumberdefault: 1

Nucleus sampling value from 0 to 1.

frequency_penaltynumberdefault: 0

Frequency penalty from -2 to 2.

presence_penaltynumberdefault: 0

Presence penalty from -2 to 2.

stopstring | string[]

Stop sequence or up to 4 stop strings. Each stop string can contain up to 500 characters.

toolsarray

Function-calling tool definitions. The array can contain up to 64 tools and the serialized tool payload can be up to 64KB.

tool_choicestring | object

Tool selection behavior, such as auto, none, or a specific function tool.

response_formatobject

Response formatting control. response_format.type must be text, json_object, or json_schema. The serialized payload can be up to 32KB.

seedinteger

Deterministic sampling seed when the selected model supports it. Must be a signed 32-bit integer.

Message field limits

Field	Limit
`messages`	1 to 100 items
`message.content` string	Up to 200,000 characters
`message.content` array	Up to 50 object parts
`message.name`	String up to 64 characters
`message.tool_call_id`	String up to 256 characters
`message.tool_calls`	Must be an array when provided

Note

Invalid requests are rejected before model lookup, token estimation, credit reservation, or provider forwarding.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="ng-your-key",
    base_url="https://api.nexgate.app/v1",
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "user", "content": "Write a one-sentence product tagline."}
    ],
    max_tokens=64,
    temperature=0.7,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "ng-your-key",
  baseURL: "https://api.nexgate.app/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [
    { role: "user", content: "Write a one-sentence product tagline." },
  ],
  max_tokens: 64,
  temperature: 0.7,
});

console.log(response.choices[0].message.content);

curl https://api.nexgate.app/v1/chat/completions \
  -H "Authorization: Bearer ng-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "user", "content": "Write a one-sentence product tagline."}
    ],
    "max_tokens": 64,
    "temperature": 0.7
  }'

Streaming

Set stream to true to receive text/event-stream chunks.

stream = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Count to five."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

const stream = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [{ role: "user", content: "Count to five." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"One"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15}}

data: [DONE]

Note

Streaming responses include usage in the final provider chunk when available. If the client disconnects or the stream fails before usage is available, NexGate releases the reserved credits and records the error.

Function calling

{
  "model": "gpt-5.5",
  "messages": [
    { "role": "user", "content": "What's the weather in Paris?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

JSON mode and JSON schema

{
  "model": "gpt-5.5",
  "messages": [
    { "role": "system", "content": "Return JSON only." },
    { "role": "user", "content": "List three colors." }
  ],
  "response_format": { "type": "json_object" }
}

{
  "model": "gpt-5.5",
  "messages": [
    { "role": "user", "content": "Return a project summary." }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "project_summary",
      "schema": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "risk": { "type": "string" }
        },
        "required": ["title", "risk"]
      }
    }
  }
}

Success response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1746789600,
  "model": "gpt-5.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Build faster with prepaid AI that keeps costs predictable."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 11,
    "total_tokens": 29
  }
}

Error responses

{
  "error": {
    "message": "Insufficient credits. Available: $0.0020. Estimated cost: $0.0050. Top up at https://dashboard.nexgate.app/topup",
    "type": "insufficient_credits",
    "code": "402"
  }
}

Status	`error.type`	When
`400`	`invalid_request_error`	Bad JSON, non-object body, schema validation failure, unknown model, or disabled model
`401`	`authentication_error`	Missing bearer token, invalid API key, or account record not found
`402`	`insufficient_credits`	Available balance cannot cover the estimated request ceiling
`403`	`permission_error`	Account is banned or deleted
`405`	`invalid_request_error`	`GET /api/v1/chat/completions`; only `POST` is accepted
`413`	`invalid_request_error`	Request body exceeds 64KB
`429`	`rate_limit_error`	Hourly spend safety limit is reached
`502`	`provider_error`	Upstream provider request failed

Warning

Provider calls have a 120-second timeout. For streaming requests, timeout or provider errors are emitted as SSE error chunks when possible.

Credit and usage behavior

NexGate reserves credits before calling the provider using the estimated input tokens and the maximum allowed output. After the response completes, NexGate charges the actual provider-reported usage and releases the unused reservation.

Usage logs include:

Resolved model ID
Prompt and completion tokens
Actual cost
Latency
Success or error status

Limits

Limit	Value
Request body	64KB
Provider timeout	120 seconds
Max messages	100
Max tools	64
Max tool payload	64KB
Max response format payload	32KB
Max concurrent requests per user	3

Model catalog

Retrieve enabled model IDs, pricing, providers, and context windows.

Pricing explained

Learn how credits are reserved, finalized, and displayed.

Chat Completions

Model catalog

Pricing explained

On this page