Skip to main content
POST
/
models
/
v2
/
openai
/
v1
/
chat
/
completions
Chat Completions
curl --request POST \
  --url https://api.bytez.com/models/v2/openai/v1/chat/completions \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "max_tokens": 256,
  "temperature": 0.7,
  "stream": false,
  "top_p": 123,
  "presence_penalty": 123,
  "frequency_penalty": 123,
  "logprobs": true,
  "top_logprobs": 123
}
'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "content": "<string>"
      },
      "finish_reason": "<string>"
    }
  ]
}

Headers

Authorization
string
required

Token for authentication

Body

application/json
model
string
required

The ID of the model to run (e.g., Qwen/Qwen3-1.7B, openai/gpt-4)

messages
object[]
required

Conversation messages (OpenAI chat format)

max_tokens
integer
default:256

Maximum number of tokens to generate

temperature
number
default:0.7

Sampling temperature

stream
boolean
default:false

Whether to stream responses

top_p
number

Nucleus sampling parameter

presence_penalty
number

Penalize new tokens based on whether they appear in the text so far

frequency_penalty
number

Penalize new tokens based on their existing frequency in the text so far

logprobs
boolean

Whether to return log probabilities of output tokens (if supported)

top_logprobs
integer

Number of most likely tokens to return at each position (if logprobs is true)

Response

Successful model completion

id
string

Unique ID for this completion

object
string

Type of returned object (usually chat.completion)

created
integer

Unix timestamp of completion

choices
object[]

Generated completions