This guide provides a simple introduction to using the langchain_bytez
package to interact with
the Bytez API. It covers text generation
, chat models
(including multimodal),
image-text-to-text
, video-text-to-text
, audio-text-to-text
, streaming, async operations, and
provides examples to get you started.
Installation
First, install the package:
pip install langchain_bytez
Authentication
You’ll need your Bytez API key to use the package. Set it as an environment variable:
export API_KEY="YOUR_BYTEZ_API_KEY"
Replace "YOUR_BYTEZ_API_KEY"
with your actual API key.
Text Generation (LLM)
The BytezLLM
class allows you to use Bytez for text generation.
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezLLM
API_KEY = os.environ.get("API_KEY")
bytez_llm = BytezLLM(
model_id="microsoft/phi-2", # Replace with your desired model ID
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
timeout=10, # minutes before expiring
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
# Create a prompt
messages = [
SystemMessage(
content="You are a helpful assistant that answers questions clearly and concisely."
),
HumanMessage(content="List the phylums in the biological taxonomy"),
]
# Generate text
results = bytez_llm.invoke(messages) # or use bytez_llm.predict("your prompt here")
print(results) # Prints out the text
Chat Models
The BytezChatModel
class provides a convenient way to interact with chat models.
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezChatModel
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="microsoft/Phi-3-mini-4k-instruct", # Replace with your model ID
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
timeout=10, # minutes before expiring
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
messages = [
SystemMessage(
content="You are a helpful assistant that answers questions clearly and concisely."
),
HumanMessage(content="List the phylums in the biological taxonomy"),
]
results = bytez_chat_model.invoke(messages)
print(results) # Prints out the text
Multimodal Models
BytezChatModel
also supports multimodal models by accepting a list of messages, where the content of each message can be text or image data.
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import SystemMessage, HumanMessage
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler # Helpful for debugging
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="meta-llama/Llama-3.2-11B-Vision-Instruct", # Replace with your model ID
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()], # Helpful for debugging
)
system_message = SystemMessage(
content=[
{"type": "text", "text": "You are a helpful assistant that answers questions clearly and concisely."},
]
)
human_message = HumanMessage(
content=[
{"type": "text", "text": "What is this image?"},
{
"type": "image",
"url": "https://hips.hearstapps.com/hmg-prod/images/how-to-keep-ducks-call-ducks-1615457181.jpg?crop=0.670xw:1.00xh;0.157xw,0&resize=980:*",
},
]
)
messages = [system_message, human_message]
response = bytez_chat_model.invoke(messages)
print(response) # Prints out the text
# Support for streaming
iterator = bytez_chat_model.stream(messages)
for chunk in iterator:
print(chunk.content, end='')
# Support for batch requests
batch_prompts = [messages, messages, messages] # List of message lists
batch_response = bytez_chat_model.batch(batch_prompts)
print(batch_response)
# Support for async, which looks like this:
iterator = bytez_chat_model.batch_as_completed(batch_prompts)
for index, output in iterator:
print(f"Batch {index}: {output}")
Image-to-Text, Video-to-Text, and Audio-to-Text
Bytez supports different kinds of multimodal models for extracting information from media. The input format is similar to the image example, but you’ll use the appropriate content type for each media type. These are also supported with both synchronous and asynchronous invocations, as well as streaming and batch. Make sure you replace the model_id with a model that supports the type of input you are giving it.
Image-to-Text
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="meta-llama/Llama-3.2-11B-Vision-Instruct", # Replace with a model supporting image input
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)
system_message = SystemMessage(
content=[
{"type": "text", "text": "You are a helpful assistant that describes images."},
]
)
human_message = HumanMessage(
content=[
{"type": "text", "text": "Describe the image in detail."},
{
"type": "image",
"url": "https://your-image-url.com/image.jpg", # Replace with your image URL
},
]
)
messages = [system_message, human_message]
response = bytez_chat_model.invoke(messages)
print(response)
Use code with caution.
Video-to-Text
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="your-video-to-text-model-id", # Replace with a model supporting video input
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 128}, # adjust for desired output length
callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)
system_message = SystemMessage(
content=[
{"type": "text", "text": "You are a helpful assistant that describes videos."},
]
)
human_message = HumanMessage(
content=[
{"type": "text", "text": "Summarize the video."},
{
"type": "video",
"url": "https://your-video-url.com/video.mp4", # Replace with your video URL
},
]
)
messages = [system_message, human_message]
response = bytez_chat_model.invoke(messages)
print(response)
Use code with caution.
Audio-to-Text
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="your-audio-to-text-model-id", # Replace with a model supporting audio input
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 128}, # adjust for desired output length
callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)
system_message = SystemMessage(
content=[
{"type": "text", "text": "You are a helpful assistant that transcribes audio."},
]
)
human_message = HumanMessage(
content=[
{"type": "text", "text": "Transcribe the audio."},
{
"type": "audio",
"url": "https://your-audio-url.com/audio.mp3", # Replace with your audio URL
},
]
)
messages = [system_message, human_message]
response = bytez_chat_model.invoke(messages)
print(response)
Use code with caution.
Streaming
To enable streaming, set streaming=True
in the constructor. This allows you to receive responses in real-time. The provided StreamingStdOutCallbackHandler is a simple way to see the streamed output.
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezChatModel
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="microsoft/Phi-3-mini-4k-instruct",
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
timeout=10,
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
messages = [
SystemMessage(
content="You are a helpful assistant that answers questions clearly and concisely."
),
HumanMessage(content="List the phylums in the biological taxonomy"),
]
results = bytez_chat_model.invoke(messages) # Results are streamed to stdout because of the callback
Use code with caution.
Extending Callback Handlers (Observability)
You can extend the behavior of the model runs by creating your own callback handlers. BytezStdOutCallbackHandler
is provided as a utility, but you’re free to create your own for enhanced logging, metrics, or other custom behavior.
import os
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler
API_KEY = os.environ.get("API_KEY")
bytez_chat_model = BytezChatModel(
model_id="microsoft/Phi-3-mini-4k-instruct",
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
timeout=10,
streaming=True,
callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()], # custom + standard handler
)
messages = [
SystemMessage(
content="You are a helpful assistant that answers questions clearly and concisely."
),
HumanMessage(content="List the phylums in the biological taxonomy"),
]
results = bytez_chat_model.invoke(messages)
Use code with caution.
Python
Shutting Down and Updating Your Cluster
You can manage the underlying Bytez cluster:
bytez_chat_model.shutdown_cluster()
bytez_chat_model.capacity = {
"min": 2, # Increase the minimum number of instances
"max": 3, # Increase the maximum number of instances
}
bytez_chat_model.update_cluster()
Use code with caution.
Async Operations
The langchain_bytez
package also fully supports asynchronous operations using asyncio.
import asyncio
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_bytez import BytezChatModel, BytezStdOutCallbackHandler
async def test_chat_async():
bytez_chat_model = BytezChatModel(
model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",
api_key=API_KEY,
capacity={
"min": 1,
"max": 1,
},
params={"max_new_tokens": 64},
timeout=10, # minutes before expiring
streaming=True,
callbacks=[StreamingStdOutCallbackHandler(), BytezStdOutCallbackHandler()],
)
response = await bytez_chat_model.ainvoke(messages) # async invoke
async_iterator = bytez_chat_model.astream(messages) # async stream
async for chunk in async_iterator:
print(chunk.content, end='') # Prints out the text
batch_response = await bytez_chat_model.abatch(batch_prompts) # async batch
for response in batch_response:
print(response) # Prints out the text
async_iterator = bytez_chat_model.abatch_as_completed(batch_prompts) # async batch as completed
async for index, output in async_iterator:
print(f"Batch {index}: {output}")
pass
if __name__ == "__main__":
result = asyncio.run(test_chat_async())
Use code with caution.
Configuration Options (kwargs)
Both BytezChatModel and BytezLLM accept the following keyword arguments:
model_id (str)
: The Bytez model ID (required). Check the Bytez documentation for available models.
api_key (str)
: Your Bytez API key (required).
capacity (dict)
: Controls cluster scaling. Supports min, max, and desired keys.
timeout (int)
: Timeout in minutes for cluster shutdown after the last inference (optional).
streaming (bool)
: Enable streaming responses (default: False).
params (dict)
: Parameters to pass to the Bytez API (optional), such as max_new_tokens.
headers (dict)
: Custom headers to send with the API request (optional). Useful for authentication.
http_timeout_s (float)
: Timeout in seconds for the HTTP request (default: 300 seconds).
Resources
Feedback: Join our Discord or open an issue on GitHub
Important Notes
- Replace placeholder
model_id
values with actual Bytez model IDs. Ensure that the model ID you select supports the media type that you provide.
- Ensure your
API_KEY
environment variable is correctly set.
- Check the Bytez documentation for the latest model availability and API parameters.
- For more complex use cases, consider creating your own custom callback handlers to monitor the lifecycle of model runs.