When you run open source models, we create serverless clusters for you
What are clusters?
Add a new model to our sdk...
Model benchmarking
Instance selection
Model becomes available
Cluster creation
model.create()
or model.run()
.If you run model.create()
you’ll create a cluster for yourself, giving you full control.If you choose to skip running model.create()
and instead, you execute model.run()
first, we’ll automatically run model.create()
for you and we’ll use the default params. After the create
operation succeeds, we then pass your run
request to the cluster.A cluster spins up instances. Instances cold boot.
cold boot
.Under the hood, the cold boot
timeline includes provisioning the instance, which includes low-level operations, like the time it takes to virtually attach file systems and network cards.Once the OS is booted, we have control, and we race to download weights, load them onto a GPU, and make the instance ready for traffic. We’ve optimized many of these steps, like skipping downloading weights, where possible, to reduce cold boot time.Cluster updates
model.update()
Let’s say you want to update your cluster timeout
to 2
minutes:Auto-scaling
Cluster deletion
model.delete()
idle
for N
number of minutes, it auto-terminatesidle
when it hasn’t receive requests for a certain N
number of minutes.N
here is the timeout
parameter that’s part of your cluster config. Timeout
can be set during cluster creation or updated while your cluster is alive.For example, lets say your cluster has a timeout
= 2
. If your cluster doesn’t receive requests for 2
minutes it auto-deletes. You can think of this as a safety measure, in case a developer forgets to run model.delete()
Cluster billing
CREATE a cluster
READ a cluster
UPDATE a cluster
DELETE a cluster
Run your model on the cluster