Overview

Inference pricing for models is designed to be straightforward and predictable. Instead of relying on complex token-based pricing (which doesn’t make sense for non-text-generation models), we calculate costs based on Inference Meter Price and Time to First Inference.

Formula

Pricing = Meter Price × Inference Time

Key Features

Instance-Based Pricing

  • Models run on instances optimized for RAM usage.
  • Instances are categorized by size (e.g., Micro, Small, Super).
  • LLMs (Large Language Models) have their own specific pricing meters.

Transparent API Response Metadata

Each API response includes:

  • Inference Meter
  • Inference Meter Price
  • Inference Time
  • Inference Cost

Prices

Language Models

Instance SizeGPU RAM (GB)Inference Meter Price ($/sec)
Micro160.0000872083
XS240.0001475035
SM640.0006478333
MD960.0008433876
LG1280.0012956667
XL1920.0024468774
XXL3200.0047912685
Super6400.0059890856

All other models

Instance SizeGPU RAM (GB)Inference Meter Price ($/sec)
Micro160.00053440
XS240.00066800
SM640.00427520
MD960.00480960
LG1280.00855040
XL1920.01603200
XXL3200.02458240
Super6400.02992640

Example Pricing

A developer runs an LLM on a Micro instance with an Inference Meter Price of $0.0000872083/sec. They configure their cluster to shut down after 1 minute of inactivity. They perform non-stop streaming inference for 9 minutes, then stop. Since the cluster shuts down at 10 minutes, total cost is:

10 minutes × 60 seconds × $0.0000872083 = $0.05

Real-World Savings

ServiceCost for 10 minutes
Our LLM (Micro Instance)$0.05
AWS Lambda (16GB, no GPU)$0.16
GPT-4o (109,080 tokens @ $10.00/1M)$1.09

GPT-4o Cost Breakdown

202 tokens/sec × 540 sec = 109,080 tokens
109,080 × ($10.00 / 1,000,000 tokens) = $1.09
  • ✅ Our pricing is significantly cheaper than GPT-4o for continuous inference.
  • ✅ For real-time AI workloads, our GPU-based pricing provides better cost efficiency.