video-text-to-text

curl --request POST \ --url https://api.bytez.com/models/v2/llava-hf/LLaVA-NeXT-Video-7B-hf \ --header 'Authorization: <api-key>' \ --header 'Content-Type: application/json' \ --data ' { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this video" }, { "type": "video", "url": "https://huggingface.co/datasets/raushan-testing-hf/videos-test/resolve/main/sample_demo_1.mp4" } ] } ], "stream": true, "params": { "min_length": 10, "max_length": 100, "temperature": 0.5 } } '

{ "error": "<string>", "output": { "role": "assistant", "content": " In this video, a toddler child is captured in the midst of an engaging and noisy activity. They are sitting on a wooden bed, busy with what appears to be a game or activity involving books. The child's face is framed by a pair of glasses that are being worn, and their hand is seen in motion, possibly pushing a button or perhaps interacting with a device. The child seems to be concentrating on something distant in their hand, with their body language suggesting a sense of excitement or engagement. The room appears to be a child's room, possibly with a greenish tone, and there's a white object in the background that resembles bedding, possibly children's furniture. Another player or toy is seen behind the child, contributing to the lively atmosphere. In front of the child, a blanket-like object could be perceived as a part of the game, its color or texture not clearly distinguishable. The tone of the video is lively, capturing everyday moments in an animated and playful context." } }

Authorizations

Authorization

string

header

required

Set Authorization header to BYTEZ_KEY 'Authorization: YOUR_BYTEZ_KEY_HERE'

Body

application/json

Schema for video-text-to-text models

messages

object[]

required

Conversation history.

Show child attributes

Example:

[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Describe this video"
      },
      {
        "type": "video",
        "url": "https://huggingface.co/datasets/raushan-testing-hf/videos-test/resolve/main/sample_demo_1.mp4"
      }
    ]
  }
]

stream

boolean

Enable text streaming.

params

object

Model-specific parameters.

Show child attributes

Response

Successful video-text-to-text response.

error

string | null

required

Null on success; otherwise an error message.

output

object

required

The output message generated by the model

Example:

{
  "role": "assistant",
  "content": " In this video, a toddler child is captured in the midst of an engaging and noisy activity. They are sitting on a wooden bed, busy with what appears to be a game or activity involving books. The child's face is framed by a pair of glasses that are being worn, and their hand is seen in motion, possibly pushing a button or perhaps interacting with a device. The child seems to be concentrating on something distant in their hand, with their body language suggesting a sense of excitement or engagement. The room appears to be a child's room, possibly with a greenish tone, and there's a white object in the background that resembles bedding, possibly children's furniture. Another player or toy is seen behind the child, contributing to the lively atmosphere. In front of the child, a blanket-like object could be perceived as a part of the game, its color or texture not clearly distinguishable. The tone of the video is lively, capturing everyday moments in an animated and playful context."
}

Documentation Index

Authorizations

Body

Response