Chat with AI models using text and videos. Also known as video-text-to-text
Set Authorization header to BYTEZ_KEY
'Authorization: YOUR_BYTEZ_KEY_HERE'
Schema for video-text-to-text models
Conversation history.
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this video"
},
{
"type": "video",
"url": "https://huggingface.co/datasets/raushan-testing-hf/videos-test/resolve/main/sample_demo_1.mp4"
}
]
}
]Enable text streaming.
Model-specific parameters.
Successful video-text-to-text response.
Null on success; otherwise an error message.
The output message generated by the model
{
"role": "assistant",
"content": " In this video, a toddler child is captured in the midst of an engaging and noisy activity. They are sitting on a wooden bed, busy with what appears to be a game or activity involving books. The child's face is framed by a pair of glasses that are being worn, and their hand is seen in motion, possibly pushing a button or perhaps interacting with a device. The child seems to be concentrating on something distant in their hand, with their body language suggesting a sense of excitement or engagement. The room appears to be a child's room, possibly with a greenish tone, and there's a white object in the background that resembles bedding, possibly children's furniture. Another player or toy is seen behind the child, contributing to the lively atmosphere. In front of the child, a blanket-like object could be perceived as a part of the game, its color or texture not clearly distinguishable. The tone of the video is lively, capturing everyday moments in an animated and playful context."
}