Pricing Plans
Choose the plan that best fits your needs.
Serverless Text Models
Base model parameter count | $/1M tokens (Applies to both input and output tokens) |
---|
0 B - 16 B | $0.20 |
---|
16.1 B+ | $0.90 |
---|
Custom Fine Tuned Models
Host any custom model with up to 16B parameters with us and only pay for what you use.
Serverless: $0.40 / 1M tokens (cold starts ~30 seconds)
Dedicated: See dedicated pricing below
Dedicated Deployments
Don’t want cold starts? Host any open source or custom fine tuned model up to 16B parameters. Reach out to us if you would like to host a larger model.
Pricing
- Minimum spend of $3k per dedicated H100 GPU per month
- Autoscaling available at $0.40/1M tokens.
Autoscaling will be performed on a best-effort basis to maintain a QOS of 1s TTFT, subject to GPU availability.
Supported Models
We are continuously adding new models to our platform. Not seeing a model you would like to use? Contact us to get it listed.