AI Inference API Providers

Compare AI inference API providers side by side — pricing, supported models, features, and free tiers. Whether you need the cheapest LLM API, the fastest image generation endpoint, or a provider with OpenAI-compatible routing, find the right fit below.

Covering 13+ providers including OpenRouter, Together AI, Fireworks AI, fal.ai, Replicate, DeepInfra, Groq, and more. Filter by type, category, or browse the full directory.

All Serverless Proprietary GPU Cloud Aggregator Platform

LLM Inference Image Generation Video Generation Audio / Music Embeddings GPU Cloud Multi-Modal

DeepInfra

Serverless Featured

Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.

13 models

OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier

Fireworks AI

Serverless Featured

Fast serverless inference for open-source models with per-token pricing, fine-tuning, and on-demand deployments.

24 models

$1 in free credits free

OpenAI Compat Streaming Finetuning Embeddings Functions Vision Free Tier

Google

Proprietary Featured

Official Gemini API via Google AI Studio and Vertex AI. Direct access to Gemini, Imagen, and Gemma models.

19 models

Streaming Embeddings Functions Vision Free Tier

Groq

Serverless Featured

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

4 models

OpenAI Compat Streaming Functions Vision Free Tier

KIE AI

Aggregator Featured

Affordable AI API aggregator offering 259+ models across chat, image, video, and music at discounted prices.

55 models

OpenAI Compat Streaming Vision

Muapi

Aggregator Featured

AI API aggregator with 315+ model endpoints across text, image, video, and audio at competitive prices.

75 models

OpenAI Compat Streaming Vision

Novita AI

Aggregator Featured

Budget AI inference platform with broad model catalog across LLM, image, video, and audio. Very competitive per-token pricing.

41 models

OpenAI Compat Streaming Vision Free Tier

OpenAI

Proprietary Featured

Official OpenAI API. Direct access to GPT, DALL-E, Whisper, and embedding models.

18 models

OpenAI Compat Streaming Embeddings Functions Vision Free Tier

SiliconFlow

Serverless Featured

Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.

20 models

OpenAI Compat Streaming Vision Free Tier

fal.ai

Serverless Featured

Fast inference platform for generative media — image, video, audio, and 3D models with serverless GPU infrastructure.

70 models

Streaming Vision Free Tier

AIMLAPI

Aggregator

Unified API for 400+ AI models across text, image, video, and audio. OpenAI-compatible with serverless inference.

25 models

OpenAI Compat Streaming Embeddings Vision Free Tier

Cloudflare Workers AI

Serverless

Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.

12 models

OpenAI Compat Streaming Embeddings Vision Free Tier

Friendli AI

Serverless

Fast serverless and dedicated AI inference. Korean provider with competitive pricing on open-source models and prompt caching support.

7 models

Up to 50K inference credit for new users free

OpenAI Compat Streaming Functions Free Tier