Directory

Providers

Compare inference API providers by features, pricing model, and supported models.

DE

DeepInfra

Serverless Featured

Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.

13 models
OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier
GR

Groq

Serverless Featured

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

4 models
OpenAI Compat Streaming Functions Vision Free Tier
RE

Replicate

Serverless Featured

Run and deploy machine learning models with a cloud API. Pay-per-use with serverless GPU infrastructure.

67 models
Streaming Finetuning Vision
SI

SiliconFlow

Serverless Featured

Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.

20 models
OpenAI Compat Streaming Vision Free Tier
TO

Together AI

Serverless Featured

Serverless and dedicated inference for open-source LLMs, image, video, and audio models. GPU clusters available.

9 models
OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier
FA

fal.ai

Serverless Featured

Fast inference platform for generative media — image, video, audio, and 3D models with serverless GPU infrastructure.

70 models
Streaming Vision Free Tier
CL

Cloudflare Workers AI

Serverless

Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.

12 models
OpenAI Compat Streaming Embeddings Vision Free Tier
WA

WaveSpeed AI

Serverless

Fast AI inference platform specializing in image and video generation with serverless GPU infrastructure.

16 models
OpenAI Compat Streaming Vision