Directory

AI Inference API Providers

Compare AI inference API providers side by side — pricing, supported models, features, and free tiers. Whether you need the cheapest LLM API, the fastest image generation endpoint, or a provider with OpenAI-compatible routing, find the right fit below.

Covering 16+ providers including OpenRouter, Together AI, Fireworks AI, fal.ai, Replicate, DeepInfra, Groq, and more. Filter by type, category, or browse the full directory.

CE

Cerebras

Serverless Featured

Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.

3 models
Free tier with all models, no credit card required free
OpenAI Compat Streaming Finetuning Functions Free Tier
DE

DeepInfra

Serverless Featured

Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.

13 models
OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier
FI

Fireworks AI

Serverless Featured

Fast serverless inference for open-source models with per-token pricing, fine-tuning, and on-demand deployments.

24 models
$1 in free credits free
OpenAI Compat Streaming Finetuning Embeddings Functions Vision Free Tier
GR

Groq

Serverless Featured

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

4 models
OpenAI Compat Streaming Functions Vision Free Tier
RE

Replicate

Serverless Featured

Run and deploy machine learning models with a cloud API. Pay-per-use with serverless GPU infrastructure.

67 models
Streaming Finetuning Vision
SI

SiliconFlow

Serverless Featured

Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.

20 models
OpenAI Compat Streaming Vision Free Tier
TO

Together AI

Serverless Featured

Serverless and dedicated inference for open-source LLMs, image, video, and audio models. GPU clusters available.

9 models
OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier
FA

fal.ai

Serverless Featured

Fast inference platform for generative media — image, video, audio, and 3D models with serverless GPU infrastructure.

70 models
Streaming Vision Free Tier
CL

Cloudflare Workers AI

Serverless

Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.

12 models
OpenAI Compat Streaming Embeddings Vision Free Tier
FR

Friendli AI

Serverless

Fast serverless and dedicated AI inference. Korean provider with competitive pricing on open-source models and prompt caching support.

7 models
Up to 50K inference credit for new users free
OpenAI Compat Streaming Functions Free Tier
HY

Hyperbolic

Serverless

Open-access AI cloud with serverless inference and dedicated hosting. Zero data retention, privacy-first design. Unified per-token pricing.

7 models
10 dollars free credits on signup free
OpenAI Compat Streaming Vision Free Tier
NE

Nebius

Serverless

European AI inference on Token Factory. Two flavors: fast (low latency) and base (cost-efficient). Batch at 50% off.

8 models
1 dollar in free credits to start free
OpenAI Compat Streaming Finetuning Embeddings Functions Free Tier
PA

Parasail

Serverless

Distributed AI inference network with serverless, dedicated, and batch endpoints. No rate limits, no contracts, up to 30x cheaper than legacy cloud.

13 models
Free credits on signup free
OpenAI Compat Streaming Vision Free Tier
SA

Sambanova

Serverless

AI inference on custom SN50 chips. OpenAI-compatible API with fast output speeds. GPT OSS 120B at 600+ tok/s.

7 models
OpenAI Compat Streaming Functions
VE

Venice AI

Serverless

Privacy-focused AI inference with no logging. Supports open-source and proprietary models including Claude, GPT, Grok, and open-weight LLMs.

15 models
OpenAI Compat Streaming Embeddings Functions
WA

WaveSpeed AI

Serverless

Fast AI inference platform specializing in image and video generation with serverless GPU infrastructure.

16 models
OpenAI Compat Streaming Vision