Chat Completions
OpenAI-compatible chat completions endpoint, including request parameters, limits, streaming, and error states.
Endpoint
POST https://api.nexgate.app/v1/chat/completionsUse this endpoint with any OpenAI-compatible SDK. NexGate validates the request, reserves credits, forwards the request to the configured provider, and finalizes the actual cost after the response completes.
Tip
Use /api/v1 as the SDK base URL. Dashboard, checkout, webhook, and utility endpoints live under /api/* without the /v1 prefix.
Authentication
Pass a NexGate API key in the Authorization header.
Authorization: Bearer ng-your-key-hereAPI keys start with ng-, are bcrypt-hashed at rest, and the plaintext key is only shown once when generated.
Warning
Do not expose a NexGate API key in browser code, mobile apps, public repositories, or client-side environment variables.
Base URLs
https://api.nexgate.app/v1http://localhost:3000/api/v1Request body
modelstringrequiredDirect model ID. Must be a non-empty string up to 128 characters. See the full model catalog.
messagesarrayrequiredChat messages. The array must contain 1 to 100 message objects.
Message role must be one of developer, system, user, assistant, or tool. content may be a string, an array of object parts, null, or omitted.
[
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
]streambooleandefault: Whether to stream the response as Server-Sent Events.
max_tokensintegerMaximum output tokens. Must be an integer from 1 to 200000. NexGate enforces a provider-facing ceiling based on the lower of your request and the model limit.
max_completion_tokensintegerAlternative output ceiling for GPT-5 and o-series style model families. Must be an integer from 1 to 200000.
temperaturenumberdefault: 1Sampling temperature from 0 to 2.
top_pnumberdefault: 1Nucleus sampling value from 0 to 1.
frequency_penaltynumberdefault: 0Frequency penalty from -2 to 2.
presence_penaltynumberdefault: 0Presence penalty from -2 to 2.
stopstring | string[]Stop sequence or up to 4 stop strings. Each stop string can contain up to 500 characters.
toolsarrayFunction-calling tool definitions. The array can contain up to 64 tools and the serialized tool payload can be up to 64KB.
tool_choicestring | objectTool selection behavior, such as auto, none, or a specific function tool.
response_formatobjectResponse formatting control. response_format.type must be text, json_object, or json_schema. The serialized payload can be up to 32KB.
seedintegerDeterministic sampling seed when the selected model supports it. Must be a signed 32-bit integer.
Message field limits
| Field | Limit |
|---|---|
messages | 1 to 100 items |
message.content string | Up to 200,000 characters |
message.content array | Up to 50 object parts |
message.name | String up to 64 characters |
message.tool_call_id | String up to 256 characters |
message.tool_calls | Must be an array when provided |
Note
Invalid requests are rejected before model lookup, token estimation, credit reservation, or provider forwarding.
Examples
from openai import OpenAI
client = OpenAI(
api_key="ng-your-key",
base_url="https://api.nexgate.app/v1",
)
response = client.chat.completions.create(
model="gpt-5.5",
messages=[
{"role": "user", "content": "Write a one-sentence product tagline."}
],
max_tokens=64,
temperature=0.7,
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "ng-your-key",
baseURL: "https://api.nexgate.app/v1",
});
const response = await client.chat.completions.create({
model: "gpt-5.5",
messages: [
{ role: "user", content: "Write a one-sentence product tagline." },
],
max_tokens: 64,
temperature: 0.7,
});
console.log(response.choices[0].message.content);curl https://api.nexgate.app/v1/chat/completions \
-H "Authorization: Bearer ng-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [
{"role": "user", "content": "Write a one-sentence product tagline."}
],
"max_tokens": 64,
"temperature": 0.7
}'Streaming
Set stream to true to receive text/event-stream chunks.
stream = client.chat.completions.create(
model="gpt-5.5",
messages=[{"role": "user", "content": "Count to five."}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="")const stream = await client.chat.completions.create({
model: "gpt-5.5",
messages: [{ role: "user", content: "Count to five." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"One"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15}}
data: [DONE]Note
Streaming responses include usage in the final provider chunk when available. If the client disconnects or the stream fails before usage is available, NexGate releases the reserved credits and records the error.
Function calling
{
"model": "gpt-5.5",
"messages": [
{ "role": "user", "content": "What's the weather in Paris?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}JSON mode and JSON schema
{
"model": "gpt-5.5",
"messages": [
{ "role": "system", "content": "Return JSON only." },
{ "role": "user", "content": "List three colors." }
],
"response_format": { "type": "json_object" }
}{
"model": "gpt-5.5",
"messages": [
{ "role": "user", "content": "Return a project summary." }
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "project_summary",
"schema": {
"type": "object",
"properties": {
"title": { "type": "string" },
"risk": { "type": "string" }
},
"required": ["title", "risk"]
}
}
}
}Success response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1746789600,
"model": "gpt-5.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Build faster with prepaid AI that keeps costs predictable."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 11,
"total_tokens": 29
}
}Error responses
{
"error": {
"message": "Insufficient credits. Available: $0.0020. Estimated cost: $0.0050. Top up at https://dashboard.nexgate.app/topup",
"type": "insufficient_credits",
"code": "402"
}
}| Status | error.type | When |
|---|---|---|
400 | invalid_request_error | Bad JSON, non-object body, schema validation failure, unknown model, or disabled model |
401 | authentication_error | Missing bearer token, invalid API key, or account record not found |
402 | insufficient_credits | Available balance cannot cover the estimated request ceiling |
403 | permission_error | Account is banned or deleted |
405 | invalid_request_error | GET /api/v1/chat/completions; only POST is accepted |
413 | invalid_request_error | Request body exceeds 64KB |
429 | rate_limit_error | Hourly spend safety limit is reached |
502 | provider_error | Upstream provider request failed |
Warning
Provider calls have a 120-second timeout. For streaming requests, timeout or provider errors are emitted as SSE error chunks when possible.
Credit and usage behavior
NexGate reserves credits before calling the provider using the estimated input tokens and the maximum allowed output. After the response completes, NexGate charges the actual provider-reported usage and releases the unused reservation.
Usage logs include:
- Resolved model ID
- Prompt and completion tokens
- Actual cost
- Latency
- Success or error status
Limits
| Limit | Value |
|---|---|
| Request body | 64KB |
| Provider timeout | 120 seconds |
| Max messages | 100 |
| Max tools | 64 |
| Max tool payload | 64KB |
| Max response format payload | 32KB |
| Max concurrent requests per user | 3 |