Skip to main content
POST
/
models
/
v2
/
llava-hf
/
LLaVA-NeXT-Video-7B-hf
video-text-to-text
curl --request POST \
  --url https://api.bytez.com/models/v2/llava-hf/LLaVA-NeXT-Video-7B-hf \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this video"
        },
        {
          "type": "video",
          "url": "https://huggingface.co/datasets/raushan-testing-hf/videos-test/resolve/main/sample_demo_1.mp4"
        }
      ]
    }
  ],
  "stream": true,
  "params": {
    "min_length": 10,
    "max_length": 100,
    "temperature": 0.5
  }
}
'
{
  "error": "<string>",
  "output": {
    "role": "assistant",
    "content": " In this video, a toddler child is captured in the midst of an engaging and noisy activity. They are sitting on a wooden bed, busy with what appears to be a game or activity involving books. The child's face is framed by a pair of glasses that are being worn, and their hand is seen in motion, possibly pushing a button or perhaps interacting with a device. The child seems to be concentrating on something distant in their hand, with their body language suggesting a sense of excitement or engagement. The room appears to be a child's room, possibly with a greenish tone, and there's a white object in the background that resembles bedding, possibly children's furniture. Another player or toy is seen behind the child, contributing to the lively atmosphere. In front of the child, a blanket-like object could be perceived as a part of the game, its color or texture not clearly distinguishable. The tone of the video is lively, capturing everyday moments in an animated and playful context."
  }
}

Authorizations

Authorization
string
header
required

Set Authorization header to BYTEZ_KEY 'Authorization: YOUR_BYTEZ_KEY_HERE'

Body

application/json

Schema for video-text-to-text models

messages
object[]
required

Conversation history.

Example:
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this video"
},
{
"type": "video",
"url": "https://huggingface.co/datasets/raushan-testing-hf/videos-test/resolve/main/sample_demo_1.mp4"
}
]
}
]
stream
boolean

Enable text streaming.

params
object

Model-specific parameters.

Response

Successful video-text-to-text response.

error
string | null
required

Null on success; otherwise an error message.

output
object
required

The output message generated by the model

Example:
{
"role": "assistant",
"content": " In this video, a toddler child is captured in the midst of an engaging and noisy activity. They are sitting on a wooden bed, busy with what appears to be a game or activity involving books. The child's face is framed by a pair of glasses that are being worn, and their hand is seen in motion, possibly pushing a button or perhaps interacting with a device. The child seems to be concentrating on something distant in their hand, with their body language suggesting a sense of excitement or engagement. The room appears to be a child's room, possibly with a greenish tone, and there's a white object in the background that resembles bedding, possibly children's furniture. Another player or toy is seen behind the child, contributing to the lively atmosphere. In front of the child, a blanket-like object could be perceived as a part of the game, its color or texture not clearly distinguishable. The tone of the video is lively, capturing everyday moments in an animated and playful context."
}