Overview
Inference pricing for models is designed to be straightforward and predictable. Instead of relying
on complex token-based pricing (which doesn’t make sense for non-text-generation models), we
calculate costs based on Inference Meter Price
and Time to First Inference
.
Pricing = Meter Price × Inference Time
Key Features
Instance-Based Pricing
- Models run on instances optimized for RAM usage.
- Instances are categorized by size (e.g.,
Micro
, Small
, Super
).
- LLMs (Large Language Models) have their own specific pricing meters.
Each API response includes:
Inference Meter
Inference Meter Price
Inference Time
Inference Cost
Prices
Language Models
Instance Size | GPU RAM (GB) | Inference Meter Price ($/sec) |
---|
Micro | 16 | 0.0000872083 |
XS | 24 | 0.0001475035 |
SM | 64 | 0.0006478333 |
MD | 96 | 0.0008433876 |
LG | 128 | 0.0012956667 |
XL | 192 | 0.0024468774 |
XXL | 320 | 0.0047912685 |
Super | 640 | 0.0059890856 |
All other models
Instance Size | GPU RAM (GB) | Inference Meter Price ($/sec) |
---|
Micro | 16 | 0.00053440 |
XS | 24 | 0.00066800 |
SM | 64 | 0.00427520 |
MD | 96 | 0.00480960 |
LG | 128 | 0.00855040 |
XL | 192 | 0.01603200 |
XXL | 320 | 0.02458240 |
Super | 640 | 0.02992640 |
Example Pricing
A developer runs an LLM on a Micro
instance with an Inference Meter Price
of $0.0000872083/sec
. They configure their cluster to shut down after 1 minute of inactivity. They perform non-stop streaming inference for 9 minutes, then stop. Since the cluster shuts down at 10 minutes, total cost is:
10 minutes × 60 seconds × $0.0000872083 = $0.05
Real-World Savings
Service | Cost for 10 minutes |
---|
Our LLM (Micro Instance) | $0.05 |
AWS Lambda (16GB, no GPU) | $0.16 |
GPT-4o (109,080 tokens @ $10.00/1M) | $1.09 |
GPT-4o Cost Breakdown
202 tokens/sec × 540 sec = 109,080 tokens
109,080 × ($10.00 / 1,000,000 tokens) = $1.09
- ✅ Our pricing is significantly cheaper than GPT-4o for continuous inference.
- ✅ For real-time AI workloads, our GPU-based pricing provides better cost efficiency.