Skip to main content
Vast Serverless relies on benchmark testing to determine the most cost-effective GPU when scaling up (which workers to recruit), routing requests (which workers have available capacity), and scaling down (which workers to release). This benchmark is part of the PyWorker configuration within the SDK and is an integral component of how Vast Serverless operates.

How Benchmark Testing Works

When a new Workergroup is created, the serverless engine enters a learning phase. During this phase, it recruits a variety of machine types from those specified in search_params. For each new worker, the engine runs the user-configured benchmark and evaluates performance. As traffic scales up and down, the serverless engine builds an application-specific understanding of cost vs. performance, which it then uses to make informed decisions about future worker recruitment and release.

Best Practices for Initial Scaling

The speed at which the serverless engine “settles” into the most cost-effective mix of workers can vary depending on how quickly workers are recruited and released. Because of this, it is recommended to apply a test load during the first day of operation to help the system efficiently explore and converge on optimal hardware choices.

Simulating Load

For examples of how to simulate load against your endpoint, see the client examples in the Vast SDK repository: https://github.com/vast-ai/vast-sdk/tree/main/examples/client