Multimodal
Chat + Vision
API Reference
- Overview
- Endpoints
- GETModels
- GETTasks
- GETClusters
Text as Input
Image as Input
Multimodal
Chat + Vision
Analyze images using the Llama-3.2-11B-Vision-Instruct model.
POST
/
models
/
v2
/
meta-llama
/
Llama-3.2-11B-Vision-Instruct
curl --request POST \
--url https://api.bytez.com/models/v2/meta-llama/Llama-3.2-11B-Vision-Instruct \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is this image?"
},
{
"type": "image",
"url": "https://example.com/path-to-image.jpg"
}
]
}
]
}'
{
"output": [
"<string>"
]
}
Body
application/json
Response
200 - application/json
Successful response
curl --request POST \
--url https://api.bytez.com/models/v2/meta-llama/Llama-3.2-11B-Vision-Instruct \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is this image?"
},
{
"type": "image",
"url": "https://example.com/path-to-image.jpg"
}
]
}
]
}'
{
"output": [
"<string>"
]
}