Pricing

Unlike other providers, Vast Serverless offers pay-per-second pricing for all workloads at the same as Vast.ai’s non-Serverless GPU instances. As a Serverless endpoint takes requests, it will automatically scale its number of workers up or down depending on the incoming and forecasted demand. When scaling up, the engine recruits from the Vast.ai GPU marketplace to find the best price-performance worker available. Once identified, its cost is added to the running sum of all GPU instances running on your Serverless instance. As demand reduces, the engine will remove the GPU with the worst price-performance first.

Billing for Workers

The following table breaks down the specific charges based on worker state:

State	Description	GPU compute	Storage	Bandwidth (in/out)
Ready	An active worker	Billed	Billed	Billed
Loading	Model is loading	Billed	Billed	Billed
Creating	Worker recruiting	Not billed	Billed	Billed
Inactive	A cold worker	Not billed	Billed	Billed

Billing for Endpoints

The following table breaks down the specific charges based on endpoint state:

State	Description	GPU compute
Active	- Engine is actively managing worker recruitment and release - Workers are active	All workers billed at their relevant states
Suspended	- Engine is NOT managing worker recruitment and release - Workers are active.	Workers are billed based on their state at time of suspension. Any workers that are currently being created or are loading, will complete to a ready state (and be billed as such).
Stopped	- Engine is NOT managing worker recruitment and release - Workers are all inactive	All workers are changed to and billed in inactive state
Destroyed	- Engine is NOT managing worker recruitment and released - All workers are destroyed	All billing stops

Get Started

Instances

Serverless

Templates

Reference

Billing for Workers

Billing for Endpoints

Get Started

Instances

Serverless

Templates

Reference

​Billing for Workers

​Billing for Endpoints

Billing for Workers

Billing for Endpoints