Plans
Free
- Run open models up to 7B parameters
- Access all closed model providers
- 1 concurrent request (open models)
- 10 requests/second (closed models)
- Credits refresh every 4 weeks
Pay-as-you-go
- Run open models up to 120B parameters
- Access all closed model providers
- Rate limits scale with credits purchased
- Unlimited closed model requests
- Add credits anytime
How Credits Work
Credits are a unified currency across all models on Bytez. When you run a model, credits are deducted from your balance based on usage.| Model Type | How Credits Are Consumed |
|---|---|
| Closed models | Based on provider pricing (per token, per image, per video, etc.) |
| Open models | Per second of inference |
- Which open models you can access - Larger open models require more credits purchased to unlock
- Your rate limits - More credits purchased unlocks more concurrent requests
Adding credits immediately unlocks higher tiers. You don’t need to wait for the next billing
cycle.
Credit Unlock Thresholds
| Credits Purchased (last 4 weeks) | Open Model Access | Concurrent Requests (7B) |
|---|---|---|
| $0 (Free) | Up to 7B | 1 |
| $3+ | Up to 7B | 4 |
| $10+ | Up to 35B | 4 |
| $25+ | Up to 70B | 10 |
| $50+ | Up to 120B | 20 |
| $100+ | Up to 120B | 40 |
| $500+ | Up to 120B | 200 |
| $1,000+ | Up to 120B | 400 |
Closed Model Billing
For closed-source models (OpenAI, Anthropic, Google, Mistral, Cohere), we pass through the provider’s pricing plus a small platform fee.Why the 2% fee?
Why the 2% fee?
The platform fee covers:
- Unified API translation and standardization
- Request routing and load balancing
- Usage tracking and analytics
- Support and reliability infrastructure
What’s included
- Pass-through pricing - Pay only for what the provider charges
- No minimum - No monthly minimums or commitments
- Real-time pricing - We pass through provider rates as they change
Open Model Billing
Open-source models run on our serverless GPU infrastructure. You’re billed per second of inference time - no cold start fees, no idle charges.Pricing by Model Size
Bigger models use more VRAM, so they cost more per second:| Model Size | Per Second | Per Hour |
|---|---|---|
| 7B | $0.000072 | ~$0.26 |
| 15B | $0.000108 | ~$0.39 |
| 35B | $0.000144 | ~$0.52 |
| 70B | $0.000216 | ~$0.78 |
| 120B | $0.00036 | ~$1.30 |
How we calculate pricing
How we calculate pricing
Our base rate is $0.0000045/GB-second of VRAM used.For comparison:
- Bytez: $0.0000045/GB-sec (with Nvidia GPUs)
- AWS Lambda: $0.0000167/GB-sec (CPUs only)
What’s included
- Per-second billing - Billed in 1-second increments
- No cold start fees - You don’t pay while the model loads
- No idle charges - You don’t pay when not running inference
- No reserved instances - No commitments, no minimums
Auto-Reload
Auto-reload automatically tops up your credit balance when it runs low, so your API calls never fail unexpectedly.How it works
| Setting | Default | Description |
|---|---|---|
| Threshold | $3 | Reload triggers when balance drops below this |
| Reload amount | $10 | Amount added to your balance |
| Monthly max | $100 | Maximum auto-reload spend per month |
1
Balance drops below threshold
When your credit balance falls below $3 (default), auto-reload activates
2
Card is charged
Your saved payment method is charged $10 (default reload amount)
3
Credits are added
$10 in credits is immediately added to your balance
4
Monthly cap enforced
Auto-reload stops if you’ve hit your monthly maximum ($100 default)
If Auto-Reload is Disabled
When auto-reload is off and your credits run out, you may get an API response like this:Configuring Auto-Reload
You can enable, disable, or adjust auto-reload settings in your API Dashboard.Auto-Scaling (Open Models)
By default, if you exceed your open model rate limits, requests are rejected with a rate-limit error. If you want your rate limits to automatically scale with your traffic in production, addautoScale: true to your request:
For closed models, you get unlimited rate limits on a pay-as-you-go basis - no auto-scaling needed.
Billing Cycle
Free Plan
Free Plan
- Billing: None
- Credits: free credits, refreshed every 4 weeks
- Expiration: Credits expire 4 weeks after grant
Pay-as-you-go Plan
Pay-as-you-go Plan
- Billing: charged on signup date
- Credits: $5 in credits granted each billing cycle
- Expiration: All credits expire 4 weeks after grant
Adding Credits Mid-Cycle
You can add credits at any time. When you do:- Immediate access - Higher model tiers and rate limits unlock instantly
- No proration - You get the full credit amount immediately
- Credits stack - Purchased credits add to your existing balance
Example: You’re on
Pay-as-you-go with remaining. You add . Your new balance is ,
which immediately unlocks 70B models and 10 concurrent requests.FAQ
What happens if I run out of credits mid-request?
What happens if I run out of credits mid-request?
In-flight requests will complete. Only new requests will fail with a 402 error.
Can I get a refund on unused credits?
Can I get a refund on unused credits?
Credits are non-refundable and expire 4 weeks after purchase.
How do I track my usage?
How do I track my usage?
Visit your API Dashboard to see real-time usage, credit
balance, and request history.
Why do bigger models require more credits purchased?
Why do bigger models require more credits purchased?
Larger models require more GPU resources (VRAM). Requiring a minimum purchase threshold ensures
you have enough credits to complete meaningful workloads without running out mid-task.
Is there volume pricing?
Is there volume pricing?
For high-volume usage (>/month), contact us at [email protected] for
custom pricing.