This guide provides a simple introduction to using the langchain_bytez package to interact with the Bytez API. It covers text generation, chat models (including multimodal), image-text-to-text, video-text-to-text, audio-text-to-text, streaming, async operations, and provides examples to get you started.

Installation

First, install the package:

pip install langchain_bytez

Authentication

You’ll need your Bytez API key to use the package. Set it as an environment variable:

export API_KEY="YOUR_BYTEZ_API_KEY"

Replace "YOUR_BYTEZ_API_KEY" with your actual API key.

Text Generation (LLM)

The BytezLLM class allows you to use Bytez for text generation.

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezLLM

API_KEY = os.environ.get("API_KEY")


bytez_llm = BytezLLM(
    model_id="microsoft/phi-2",  # Replace with your desired model ID
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 64},
    timeout=10,  # minutes before expiring
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
)

# Create a prompt
messages = [
    SystemMessage(
        content="You are a helpful assistant that answers questions clearly and concisely."
    ),
    HumanMessage(content="List the phylums in the biological taxonomy"),
]

# Generate text
results = bytez_llm.invoke(messages) # or use bytez_llm.predict("your prompt here")
print(results) # Prints out the text

Chat Models

The BytezChatModel class provides a convenient way to interact with chat models.

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezChatModel

API_KEY = os.environ.get("API_KEY")

bytez_chat_model = BytezChatModel(
    model_id="microsoft/Phi-3-mini-4k-instruct",  # Replace with your model ID
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 64},
    timeout=10,  # minutes before expiring
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that answers questions clearly and concisely."
    ),
    HumanMessage(content="List the phylums in the biological taxonomy"),
]

results = bytez_chat_model.invoke(messages)
print(results) # Prints out the text

Multimodal Models

BytezChatModel also supports multimodal models by accepting a list of messages, where the content of each message can be text or image data.

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import SystemMessage, HumanMessage

from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler  # Helpful for debugging

API_KEY = os.environ.get("API_KEY")

bytez_chat_model = BytezChatModel(
    model_id="meta-llama/Llama-3.2-11B-Vision-Instruct", # Replace with your model ID
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 64},
    callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()], # Helpful for debugging
)

system_message = SystemMessage(
    content=[
        {"type": "text", "text": "You are a helpful assistant that answers questions clearly and concisely."},
    ]
)

human_message = HumanMessage(
    content=[
        {"type": "text", "text": "What is this image?"},
        {
            "type": "image",
            "url": "https://hips.hearstapps.com/hmg-prod/images/how-to-keep-ducks-call-ducks-1615457181.jpg?crop=0.670xw:1.00xh;0.157xw,0&resize=980:*",
        },
    ]
)

messages = [system_message, human_message]

response = bytez_chat_model.invoke(messages)
print(response) # Prints out the text

# Support for streaming
iterator = bytez_chat_model.stream(messages)

for chunk in iterator:
  print(chunk.content, end='')

# Support for batch requests

batch_prompts = [messages, messages, messages] # List of message lists
batch_response = bytez_chat_model.batch(batch_prompts)
print(batch_response)

# Support for async, which looks like this:

iterator = bytez_chat_model.batch_as_completed(batch_prompts)

for index, output in iterator:
  print(f"Batch {index}: {output}")

Image-to-Text, Video-to-Text, and Audio-to-Text

Bytez supports different kinds of multimodal models for extracting information from media. The input format is similar to the image example, but you’ll use the appropriate content type for each media type. These are also supported with both synchronous and asynchronous invocations, as well as streaming and batch. Make sure you replace the model_id with a model that supports the type of input you are giving it.

Image-to-Text

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler

API_KEY = os.environ.get("API_KEY")

bytez_chat_model = BytezChatModel(
    model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",  # Replace with a model supporting image input
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 64},
    callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)

system_message = SystemMessage(
    content=[
        {"type": "text", "text": "You are a helpful assistant that describes images."},
    ]
)

human_message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe the image in detail."},
        {
            "type": "image",
            "url": "https://your-image-url.com/image.jpg",  # Replace with your image URL
        },
    ]
)

messages = [system_message, human_message]

response = bytez_chat_model.invoke(messages)
print(response)
Use code with caution.

Video-to-Text

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler

API_KEY = os.environ.get("API_KEY")

bytez_chat_model = BytezChatModel(
    model_id="your-video-to-text-model-id",  # Replace with a model supporting video input
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 128},  # adjust for desired output length
    callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)

system_message = SystemMessage(
    content=[
        {"type": "text", "text": "You are a helpful assistant that describes videos."},
    ]
)

human_message = HumanMessage(
    content=[
        {"type": "text", "text": "Summarize the video."},
        {
            "type": "video",
            "url": "https://your-video-url.com/video.mp4",  # Replace with your video URL
        },
    ]
)

messages = [system_message, human_message]

response = bytez_chat_model.invoke(messages)
print(response)
Use code with caution.

Audio-to-Text

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler

API_KEY = os.environ.get("API_KEY")

bytez_chat_model = BytezChatModel(
    model_id="your-audio-to-text-model-id",  # Replace with a model supporting audio input
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 128},  # adjust for desired output length
    callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)

system_message = SystemMessage(
    content=[
        {"type": "text", "text": "You are a helpful assistant that transcribes audio."},
    ]
)

human_message = HumanMessage(
    content=[
        {"type": "text", "text": "Transcribe the audio."},
        {
            "type": "audio",
            "url": "https://your-audio-url.com/audio.mp3",  # Replace with your audio URL
        },
    ]
)

messages = [system_message, human_message]

response = bytez_chat_model.invoke(messages)
print(response)
Use code with caution.

Streaming

To enable streaming, set streaming=True in the constructor. This allows you to receive responses in real-time. The provided StreamingStdOutCallbackHandler is a simple way to see the streamed output.

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezChatModel

API_KEY = os.environ.get("API_KEY")


bytez_chat_model = BytezChatModel(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 64},
    timeout=10,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that answers questions clearly and concisely."
    ),
    HumanMessage(content="List the phylums in the biological taxonomy"),
]

results = bytez_chat_model.invoke(messages) # Results are streamed to stdout because of the callback
Use code with caution.

Extending Callback Handlers (Observability)

You can extend the behavior of the model runs by creating your own callback handlers. BytezStdOutCallbackHandler is provided as a utility, but you’re free to create your own for enhanced logging, metrics, or other custom behavior.

python
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage

from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler

API_KEY = os.environ.get("API_KEY")


bytez_chat_model = BytezChatModel(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    api_key=API_KEY,
    capacity={
        "min": 1,
        "max": 1,
    },
    params={"max_new_tokens": 64},
    timeout=10,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()], # custom + standard handler
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that answers questions clearly and concisely."
    ),
    HumanMessage(content="List the phylums in the biological taxonomy"),
]

results = bytez_chat_model.invoke(messages)
Use code with caution.
Python
Shutting Down and Updating Your Cluster
You can manage the underlying Bytez cluster:

bytez_chat_model.shutdown_cluster()

bytez_chat_model.capacity = {
    "min": 2,  # Increase the minimum number of instances
    "max": 3,  # Increase the maximum number of instances
}
bytez_chat_model.update_cluster()
Use code with caution.

Async Operations

The langchain_bytez package also fully supports asynchronous operations using asyncio.

python
import asyncio
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler


async def test_chat_async():
    bytez_chat_model = BytezChatModel(
            model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",
            api_key=API_KEY,
            capacity={
                "min": 1,
                "max": 1,
            },
            params={"max_new_tokens": 64},
            timeout=10,  # minutes before expiring
            streaming=True,
            callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
        )
    
    response = await bytez_chat_model.ainvoke(messages) # async invoke

    async_iterator = bytez_chat_model.astream(messages) # async stream

    async for chunk in async_iterator:
        print(chunk.content, end='') # Prints out the text

    batch_response = await bytez_chat_model.abatch(batch_prompts) # async batch

    for response in batch_response:
        print(response) # Prints out the text
    
    async_iterator = bytez_chat_model.abatch_as_completed(batch_prompts) # async batch as completed
    async for index, output in async_iterator:
      print(f"Batch {index}: {output}")

    pass

if __name__ == "__main__":
    result = asyncio.run(test_chat_async())
Use code with caution.

Configuration Options (kwargs)

Both BytezChatModel and BytezLLM accept the following keyword arguments:

  • model_id (str): The Bytez model ID (required). Check the Bytez documentation for available models.
  • api_key (str): Your Bytez API key (required).
  • capacity (dict): Controls cluster scaling. Supports min, max, and desired keys.
  • timeout (int): Timeout in minutes for cluster shutdown after the last inference (optional).
  • streaming (bool): Enable streaming responses (default: False).
  • params (dict): Parameters to pass to the Bytez API (optional), such as max_new_tokens.
  • headers (dict): Custom headers to send with the API request (optional). Useful for authentication.
  • http_timeout_s (float): Timeout in seconds for the HTTP request (default: 300 seconds).

Resources

Feedback: Join our Discord or open an issue on GitHub

Important Notes

  • Replace placeholder model_id values with actual Bytez model IDs. Ensure that the model ID you select supports the media type that you provide.
  • Ensure your API_KEY environment variable is correctly set.
  • Check the Bytez documentation for the latest model availability and API parameters.
  • For more complex use cases, consider creating your own custom callback handlers to monitor the lifecycle of model runs.