Overview
Inference pricing for models is designed to be straightforward and predictable. Instead of relying
on complex token-based pricing (which doesn’t make sense for non-text-generation models), we
calculate costs based on Inference Meter Price and Time to First Inference.
Pricing = Meter Price × Inference Time
Key Features
Instance-Based Pricing
- Models run on instances optimized for RAM usage.
- Instances are categorized by size (e.g.,
Micro, Small, Super).
- LLMs (Large Language Models) have their own specific pricing meters.
Each API response includes:
Inference Meter
Inference Meter Price
Inference Time
Inference Cost
Prices
Language Models
| Instance Size | GPU RAM (GB) | Inference Meter Price ($/sec) |
|---|
| Micro | 16 | 0.0000872083 |
| XS | 24 | 0.0001475035 |
| SM | 64 | 0.0006478333 |
| MD | 96 | 0.0008433876 |
| LG | 128 | 0.0012956667 |
| XL | 192 | 0.0024468774 |
| XXL | 320 | 0.0047912685 |
| Super | 640 | 0.0059890856 |
All other models
| Instance Size | GPU RAM (GB) | Inference Meter Price ($/sec) |
|---|
| Micro | 16 | 0.00053440 |
| XS | 24 | 0.00066800 |
| SM | 64 | 0.00427520 |
| MD | 96 | 0.00480960 |
| LG | 128 | 0.00855040 |
| XL | 192 | 0.01603200 |
| XXL | 320 | 0.02458240 |
| Super | 640 | 0.02992640 |
Example Pricing
A developer runs an LLM on a Micro instance with an Inference Meter Price of $0.0000872083/sec. They configure their cluster to shut down after 1 minute of inactivity. They perform non-stop streaming inference for 9 minutes, then stop. Since the cluster shuts down at 10 minutes, total cost is:
10 minutes × 60 seconds × $0.0000872083 = $0.05
Real-World Savings
| Service | Cost for 10 minutes |
|---|
| Our LLM (Micro Instance) | $0.05 |
| AWS Lambda (16GB, no GPU) | $0.16 |
| GPT-4o (109,080 tokens @ $10.00/1M) | $1.09 |
GPT-4o Cost Breakdown
202 tokens/sec × 540 sec = 109,080 tokens
109,080 × ($10.00 / 1,000,000 tokens) = $1.09
- ✅ Our pricing is significantly cheaper than GPT-4o for continuous inference.
- ✅ For real-time AI workloads, our GPU-based pricing provides better cost efficiency.