Pricing Plans

Choose the plan that best fits your needs.

Serverless Text Models

Base model parameter count$/1M tokens (Applies to both input and output tokens)
0 B - 16 B$0.20
16.1 B+$0.90


Custom Fine Tuned Models

Host any custom model with up to 16B parameters with us and only pay for what you use.
Serverless:  $0.40 / 1M tokens (cold starts ~30 seconds)
Dedicated:  See dedicated pricing below


Dedicated Deployments

Don’t want cold starts? Host any open source or custom fine tuned model up to 16B parameters. Reach out to us if you would like to host a larger model.
Pricing
  • Minimum spend of $3k per dedicated H100 GPU per month
  • Autoscaling available at $0.40/1M tokens.
Autoscaling will be performed on a best-effort basis to maintain a QOS of 1s TTFT, subject to GPU availability.


Supported Models

We are continuously adding new models to our platform. Not seeing a model you would like to use? Contact us to get it listed.