Skip to main content

Threadless completions

Threadless completions use the OpenAI-compatible chat completions endpoint (POST /api/v1/chat/completions) without creating persistent chat threads. Each call is stateless: you send the full message history in the request and get a response, like the standard OpenAI API.

Use this for integrations, automation, and cases where you do not need server-side conversation storage.

How it works

  1. Send POST /api/v1/chat/completions with a messages array and a model (model ID or AI agent ID).
  2. The platform runs the same LLM path as regular chats.
  3. The response is either streaming (SSE) or a full JSON body, depending on stream.

No Chat record is created. Conversation state comes only from your request body on each call.

Basic usage

Simple completion

curl -X POST /api/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [{ "role": "user", "content": "Hello" }]
}'

Streaming

Set "stream": true for a Server-Sent Events stream:

curl -X POST /api/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [{ "role": "user", "content": "Hello" }],
"stream": true
}'

Tool calls (multi-turn)

Threadless completions support tool calling for multi-turn flows.

Step 1: Send a prompt with tool definitions.

{
"model": "gpt-4.1",
"messages": [{ "role": "user", "content": "What is the weather in NYC?" }],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": { "type": "object", "properties": { "city": { "type": "string" } } }
}
}]
}

Step 2: The reply includes tool_calls. Run the tool on your side, then send the full history back:

{
"model": "gpt-4.1",
"messages": [
{ "role": "user", "content": "What is the weather in NYC?" },
{ "role": "assistant", "content": null, "tool_calls": [{ "id": "call_abc", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\":\"NYC\"}" } }] },
{ "role": "tool", "tool_call_id": "call_abc", "content": "72°F, sunny" }
],
"tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather", "parameters": { "type": "object", "properties": { "city": { "type": "string" } } } }]
}

The model then returns a final answer using the tool output.

Request format

{
"model": "string — model ID or AI agent ID",
"messages": [
{ "role": "user", "content": "string or content parts array" },
{ "role": "assistant", "content": "string", "tool_calls": [...] },
{ "role": "tool", "tool_call_id": "string", "content": "string" }
],
"tools": [{ "type": "function", "function": { "name": "...", "description": "...", "parameters": {} } }],
"stream": false,
"parallel_tool_calls": true
}

Authorization: API key with CHATS:WRITE.

When to use threadless completions

  • Automation — Call the model without creating chat records.
  • One-off questions — Answers without new rows in the chat list.
  • Custom clients — You own history and storage on top of the stateless API.
  • Tool workflows — Multi-turn tool use with client-held state.

Differences from regular chats

AspectThreadless completionsRegular chats
Conversation persistenceNo — stateless per requestYes — persistent chat thread
Chat list visibilityNot shown in chat listVisible in chat list
Chat featuresUnavailable (pin, rename, search, export, and so on)Full chat features
State managementClient manages conversation historyServer manages history
API compatibilityOpenAI-compatible formatPlatform-specific format

Limitations

  • No persistent conversation history — Messages are not grouped under a chat. They cannot be listed or retrieved as a conversation.
  • No chat-level features — Features tied to conversations (pinning, naming, summarization, compression, search, export) do not apply.
  • Client-managed state — You must include the full conversation history in each request. The server does not retain state between calls.
⌘J