Streaming allows you to process data incrementally as it is generated by the model. This is useful for large outputs or when you want to display data to users in real-time.

JavaScript

You can leverage pipeThrough and TextDecoderStream to process streamed data efficiently:

const stream = await model.run("Jack and Jill", { stream: true });
const textStream = stream.pipeThrough(new TextDecoderStream());

for await (const chunk of textStream) {
  console.log(chunk); // Process each chunk of the stream
}

Explanation

  • model.run: Sends the input text to the model with streaming enabled.
  • pipeThrough(new TextDecoderStream()): Converts the byte stream into text.
  • for await (const chunk of textStream): Processes each chunk of text as it arrives.

Python

In Python, streaming can be achieved using the Bytez client:

from bytez import Bytez

client = Bytez("YOUR_BYTEZ_KEY_HERE")

model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()

input_text = "Once upon a time there was a beautiful home where"
model_params = {"max_new_tokens": 20, "min_new_tokens": 5, "temperature": 0.5}

stream = model.run(
    input_text,
    stream=True,
    model_params=model_params,
)

for chunk in stream:
    print(f"Output: {chunk}")  # Process each chunk

Explanation

  • model.run: Sends the input text and parameters to the model with streaming enabled.
  • for chunk in stream: Iterates through each chunk of streamed data as it is generated.

Julia

You can use the Bytez library to implement streaming with channels:

using Bytez

client = Bytez.init("YOUR_BYTEZ_KEY_HERE")

model = client.model("Qwen/Qwen2-7B-Instruct")
model.load()

input_text = "Once upon a time there was a beautiful home where"

options = Dict(
    "params" => Dict(
        "max_new_tokens" => 20,
        "min_new_tokens" => 5,
        "temperature" => 0.5,
    ),
    "stream" => true,
)

stream = model.run(input_text, options)

while isopen(stream)
    item = take!(stream)  # Take each item as it enters the channel
    println(item)  # Print the item
end

Explanation

  • model.run: Sends the input text and options to the model with streaming enabled.
  • while isopen(stream): Continuously checks for new data in the stream.
  • take!(stream): Retrieves the next item from the stream.