Closed-Source Models (e.g., OpenAI, Anthropic, Gemini)
Closed-Source Models (e.g., OpenAI, Anthropic, Gemini)
Think of us as a smart, multi-lingual translator and secure messenger when you use closed-source models. Our Unified Model Protocol means you use one consistent format for your requests and receive responses in one consistent format, regardless of the underlying provider.The Process:Key Takeaway: For closed-source models, we act as a router and standardization layer. You interact with a single, unified protocol, making it easy to switch between models providers or use multiple providers without changing your code structure. The inference itself happens on the provider’s infrastructure.Billing: We don’t charge anything for closed source models. Billing for closed-source models is based on the provider’s pricing. They’ll bill you based on the API key you provide.
1
You Send Request
Your app sends an API request using our standardized input format
2
We Translate Input
We automatically translate your request into the specific format required by
the chosen model provider (e.g., OpenAI, Google Gemini)
3
Forward Request
We securely pass your request to the model provider’s API, using your API
key, so the provider knows it’s from you
4
Provider Computes
The provider runs inference on their servers
5
We Translate Output
We receive the provider’s raw response and translate to standardized JSON
6
You Receive Response
Your app gets inference results back in standardized JSON
Open‑Source Models – Serverless GPU Inference
Open‑Source Models – Serverless GPU Inference
When you run an open‑source model, Bytez handles all the heavy lifting for you.When you make a request to our API, this is what we do:All you need to worry about is specifying a model and making requests to the API, we take care of the rest!Our goal with open source models is to make them as easy and affordable to use closed source models.
1
Start a model container
If the model is not immediately available, we spin up a model container on our infrastructure.
2
Route requests
When the model is ready for inference, we route your request to the first available instance.
3
Scale according to load
As requests come in, we scale automatically, ensuring that your scaling demands can be met
regardless of the model you choose.