Providers
Compare inference API providers by features, pricing model, and supported models.
DeepInfra
Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.
Groq
Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.
Replicate
Run and deploy machine learning models with a cloud API. Pay-per-use with serverless GPU infrastructure.
SiliconFlow
Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.
Together AI
Serverless and dedicated inference for open-source LLMs, image, video, and audio models. GPU clusters available.
fal.ai
Fast inference platform for generative media — image, video, audio, and 3D models with serverless GPU infrastructure.
Cloudflare Workers AI
Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.
WaveSpeed AI
Fast AI inference platform specializing in image and video generation with serverless GPU infrastructure.