AI Inference API Providers
Compare AI inference API providers side by side — pricing, supported models, features, and free tiers. Whether you need the cheapest LLM API, the fastest image generation endpoint, or a provider with OpenAI-compatible routing, find the right fit below.
Covering 43+ providers including OpenRouter, Together AI, Fireworks AI, fal.ai, Replicate, DeepInfra, Groq, and more. Filter by type, category, or browse the full directory.
Amazon Bedrock
Fully managed AWS service providing foundation models from Anthropic, Meta, Mistral, Cohere, and more. OpenAI-compatible API with enterprise-grade security and compliance.
Anthropic
Official Claude API. Direct access to Claude Opus, Sonnet, and Haiku models.
Cerebras
Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.
DeepInfra
Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.
Fireworks AI
Fast serverless inference for open-source models with per-token pricing, fine-tuning, and on-demand deployments.
Official Gemini API via Google AI Studio and Vertex AI. Direct access to Gemini, Imagen, and Gemma models.
Groq
Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.
KIE AI
Affordable AI API aggregator offering 259+ models across chat, image, video, and music at discounted prices.
Mistral AI
Official Mistral API. Direct access to Mistral Large, Small, and Ministral models. EU data residency available.
Muapi
AI API aggregator with 315+ model endpoints across text, image, video, and audio at competitive prices.
Novita AI
Budget AI inference platform with broad model catalog across LLM, image, video, and audio. Very competitive per-token pricing.
OpenAI
Official OpenAI API. Direct access to GPT, DALL-E, Whisper, and embedding models.
OpenRouter
Unified API for 300+ LLMs from OpenAI, Anthropic, Google, Meta, and more. Routes to the best provider automatically.
Replicate
Run and deploy machine learning models with a cloud API. Pay-per-use with serverless GPU infrastructure.
SiliconFlow
Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.
Together AI
Serverless and dedicated inference for open-source LLMs, image, video, and audio models. GPU clusters available.
fal.ai
Fast inference platform for generative media — image, video, audio, and 3D models with serverless GPU infrastructure.
AIMLAPI
Unified API for 400+ AI models across text, image, video, and audio. OpenAI-compatible with serverless inference.
Alibaba Cloud
Alibaba Cloud Model Studio offers the Qwen model family including LLMs, image generation (Qwen Image, Wan), video generation (Wan), embeddings, and TTS.
Atlas Cloud
Full-modal AI inference platform with 300+ models. Smart routing to cheapest servers with transparent pay-as-you-go pricing.
Black Forest Labs
Creator of the Flux model family. Offers image generation and editing via a credit-based API. Models range from real-time (Klein) to highest quality (Max).
BytePlus
ByteDance's international AI platform offering Seedream image generation and Seedance video generation models via the ModelArk API.
Cloudflare Workers AI
Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.
Cohere
Creator of the Command and Embed model families. Enterprise-focused NLP platform with LLMs, embeddings, and reranking. OpenAI-compatible API.
DeepSeek
Creator of DeepSeek V3 and R1 models. Offers extremely competitive pricing on high-quality LLMs with thinking/reasoning modes. OpenAI-compatible API.
ElevenLabs
Creator of industry-leading text-to-speech, speech-to-text, music generation, and sound effects models. Credit-based pricing with free tier.
Friendli AI
Fast serverless and dedicated AI inference. Korean provider with competitive pricing on open-source models and prompt caching support.
Hyperbolic
Open-access AI cloud with serverless inference and dedicated hosting. Zero data retention, privacy-first design. Unified per-token pricing.
Kling AI
Creator of the Kling video and image generation models by Kuaishou. Offers text-to-video, image-to-video, and image generation via API.
Luma AI
Creator of Ray 2 video generation and Photon image generation models. Pixel-based pricing with upscaling and audio options.
MiniMax
Creator of Hailuo video generation, MiniMax M2.5/M2.7 LLMs, Music 2.5, and Speech models. Full multimodal API platform with Anthropic-compatible endpoints.
Moonshot AI
Creator of the Kimi model family. Known for long-context LLMs with strong code and agent capabilities. OpenAI-compatible API at platform.moonshot.ai.
Nebius
European AI inference on Token Factory. Two flavors: fast (low latency) and base (cost-efficient). Batch at 50% off.
Parasail
Distributed AI inference network with serverless, dedicated, and batch endpoints. No rate limits, no contracts, up to 30x cheaper than legacy cloud.
QuiverAI
Creator of the Arrow vector SVG generation models. Offers text-to-SVG and image-to-SVG (vectorize) via REST API with credit-based pricing. Raised $8.3M seed led by a16z.
Recraft
Creator of the Recraft image generation models. Offers raster and SVG vector image generation with per-image API pricing.
Runway
Creator of Gen-4.5 and Aleph video generation models. Credit-based API at $0.01/credit. Also offers image generation and audio models.
Sambanova
AI inference on custom SN50 chips. OpenAI-compatible API with fast output speeds. GPT OSS 120B at 600+ tok/s.
Venice AI
Privacy-focused AI inference with no logging. Supports open-source and proprietary models including Claude, GPT, Grok, and open-weight LLMs.
Voyage AI
Creator of the Voyage embedding and reranking models. Specializes in high-quality text embeddings with domain-specific models for code, law, and finance.
WaveSpeed AI
Fast AI inference platform specializing in image and video generation with serverless GPU infrastructure.
Zhipu AI
Creator of the GLM model family. Offers LLMs, vision models, image generation (CogView), and video generation (CogVideoX) via an OpenAI-compatible API at z.ai.
xAI
Creator of the Grok model family. Offers LLMs with reasoning modes, image generation, and video generation. Batch API available at 50% discount.