Sends a prompt to an OpenAI compatible chat completion model and returns a completion. Provides completions for open source models that are text-generation, chat, audio-text-to-text, image-text-to-text, video-text-to-text, and also supports closed source providers openai, anthropic, mistral, cohere, and google. To send a request to a closed source provider, prefix your model with their provider name, e.g. openai/gpt-4.
Token for authentication
The ID of the model to run (e.g., Qwen/Qwen3-1.7B, openai/gpt-4)
Conversation messages (OpenAI chat format)
Maximum number of tokens to generate
Sampling temperature
Whether to stream responses
Nucleus sampling parameter
Penalize new tokens based on whether they appear in the text so far
Penalize new tokens based on their existing frequency in the text so far
Whether to return log probabilities of output tokens (if supported)
Number of most likely tokens to return at each position (if logprobs is true)