Directory

AI Inference API Providers

Compare AI inference API providers side by side — pricing, supported models, features, and free tiers. Whether you need the cheapest LLM API, the fastest image generation endpoint, or a provider with OpenAI-compatible routing, find the right fit below.

Covering 45+ providers including OpenRouter, Together AI, Fireworks AI, fal.ai, Replicate, DeepInfra, Groq, and more. Filter by type, category, or browse the full directory.

All Serverless Proprietary GPU Cloud Aggregator Platform

LLM Inference Image Generation Video Generation Audio / Music Embeddings GPU Cloud Multi-Modal

Amazon Bedrock

Platform Featured

Fully managed AWS service providing foundation models from Anthropic, Meta, Mistral, Cohere, and more. OpenAI-compatible API with enterprise-grade security and compliance.

Up to $200 in AWS free tier credits for new accounts free

OpenAI Compat Streaming Finetuning Embeddings Functions Vision SOC2 HIPAA GDPR Free Tier

Anthropic

Proprietary Featured

Official Claude API. Direct access to Claude Opus, Sonnet, and Haiku models.

Streaming Functions Vision

Cerebras

Serverless Featured

Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.

Free tier with all models, no credit card required free

OpenAI Compat Streaming Finetuning Functions Free Tier

DeepInfra

Serverless Featured

Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.

OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier

Fireworks AI

Serverless Featured

Fast serverless inference for open-source models with per-token pricing, fine-tuning, and on-demand deployments.

$1 in free credits free

OpenAI Compat Streaming Finetuning Embeddings Functions Vision Free Tier

Google

Proprietary Featured

Official Gemini API via Google AI Studio and Vertex AI. Direct access to Gemini, Imagen, and Gemma models.

Streaming Embeddings Functions Vision Free Tier

Groq

Serverless Featured

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

OpenAI Compat Streaming Functions Vision Free Tier

KIE AI

Aggregator Featured

Affordable AI API aggregator offering 259+ models across chat, image, video, and music at discounted prices.

OpenAI Compat Streaming Vision

Mistral AI

Proprietary Featured

Official Mistral API. Direct access to Mistral Large, Small, and Ministral models. EU data residency available.

OpenAI Compat Streaming Functions Vision Free Tier

Muapi

Aggregator Featured

AI API aggregator with 315+ model endpoints across text, image, video, and audio at competitive prices.

OpenAI Compat Streaming Vision

Novita AI

Aggregator Featured

Budget AI inference platform with broad model catalog across LLM, image, video, and audio. Very competitive per-token pricing.

OpenAI Compat Streaming Vision Free Tier

OpenAI

Proprietary Featured

Official OpenAI API. Direct access to GPT, DALL-E, Whisper, and embedding models.

OpenAI Compat Streaming Embeddings Functions Vision Free Tier

OpenRouter

Aggregator Featured

Unified API for 300+ LLMs from OpenAI, Anthropic, Google, Meta, and more. Routes to the best provider automatically.

OpenAI Compat Streaming Functions Vision Free Tier

Replicate

Serverless Featured

Run and deploy machine learning models with a cloud API. Pay-per-use with serverless GPU infrastructure.

Streaming Finetuning Vision

SiliconFlow

Serverless Featured

Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.

OpenAI Compat Streaming Vision Free Tier

Together AI

Serverless Featured

Serverless and dedicated inference for open-source LLMs, image, video, and audio models. GPU clusters available.

OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier

fal.ai

Serverless Featured

Fast inference platform for generative media — image, video, audio, and 3D models with serverless GPU infrastructure.

Streaming Vision Free Tier

AIMLAPI

Unified API for 400+ AI models across text, image, video, and audio. OpenAI-compatible with serverless inference.

OpenAI Compat Streaming Embeddings Vision Free Tier

Alibaba Cloud

Alibaba Cloud Model Studio offers the Qwen model family including LLMs, image generation (Qwen Image, Wan), video generation (Wan), embeddings, and TTS.

Free quota on select models free

OpenAI Compat Streaming Embeddings Functions Vision Free Tier

Atlas Cloud

Full-modal AI inference platform with 300+ models. Smart routing to cheapest servers with transparent pay-as-you-go pricing.

OpenAI Compat Streaming Vision Free Tier

Black Forest Labs

Creator of the Flux model family. Offers image generation and editing via a credit-based API. Models range from real-time (Klein) to highest quality (Max).

Flux 2 Dev is free for local development free

Vision Free Tier

BytePlus

ByteDance's international AI platform offering Seedream image generation and Seedance video generation models via the ModelArk API.

50 free images (Seedream 5.0 Lite), 2M free tokens (Seedance) free

OpenAI Compat Streaming Vision Free Tier

Cloudflare Workers AI

Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.

OpenAI Compat Streaming Embeddings Vision Free Tier

Cohere

Creator of the Command and Embed model families. Enterprise-focused NLP platform with LLMs, embeddings, and reranking. OpenAI-compatible API.

Trial key with 1,000 API calls/month free

OpenAI Compat Streaming Finetuning Embeddings Functions Free Tier

DeepSeek

Creator of DeepSeek V3 and R1 models. Offers extremely competitive pricing on high-quality LLMs with thinking/reasoning modes. OpenAI-compatible API.

OpenAI Compat Streaming Functions

ElevenLabs

Creator of industry-leading text-to-speech, speech-to-text, music generation, and sound effects models. Credit-based pricing with free tier.

10k credits/month (~20 min Flash TTS) free

Streaming Free Tier

Friendli AI

Fast serverless and dedicated AI inference. Korean provider with competitive pricing on open-source models and prompt caching support.

Up to 50K inference credit for new users free

OpenAI Compat Streaming Functions Free Tier

Hyperbolic

Open-access AI cloud with serverless inference and dedicated hosting. Zero data retention, privacy-first design. Unified per-token pricing.

10 dollars free credits on signup free

OpenAI Compat Streaming Vision Free Tier

Kling AI

Creator of the Kling video and image generation models by Kuaishou. Offers text-to-video, image-to-video, and image generation via API.

Luma AI

Creator of Ray 2 video generation and Photon image generation models. Pixel-based pricing with upscaling and audio options.

MiniMax

Creator of Hailuo video generation, MiniMax M2.5/M2.7 LLMs, Music 2.5, and Speech models. Full multimodal API platform with Anthropic-compatible endpoints.

OpenAI Compat Streaming Functions Vision

Moonshot AI

Creator of the Kimi model family. Known for long-context multimodal LLMs with strong code and agent capabilities. OpenAI/Anthropic-compatible API; developer platform at platform.kimi.ai.

OpenAI Compat Streaming Functions Vision

Nebius

European AI inference on Token Factory. Two flavors: fast (low latency) and base (cost-efficient). Batch at 50% off.

1 dollar in free credits to start free

OpenAI Compat Streaming Finetuning Embeddings Functions Free Tier

NeuralWatt

Hosted, OpenAI-compatible inference with energy-based pricing: pay a flat $5.00/kWh for actual GPU energy consumed (up to 95% cheaper on efficient models) instead of per-token, with real per-request energy metrics on every response. Standard per-token pricing and kWh-based monthly subscriptions ($20/$50/$100) are also available. Powered by Neuralwatt Optimize, also available for self-hosted vLLM via Neuralwatt Deploy.

OpenAI Compat Streaming Functions Vision

Parasail

Distributed AI inference network with serverless, dedicated, and batch endpoints. No rate limits, no contracts, up to 30x cheaper than legacy cloud.

Free credits on signup free

OpenAI Compat Streaming Vision Free Tier

QuiverAI

Creator of the Arrow vector SVG generation models. Offers text-to-SVG and image-to-SVG (vectorize) via REST API with credit-based pricing. Raised $8.3M seed led by a16z.

20 free credits on signup (~5 generations) free

Streaming Vision Free Tier

Recraft

Creator of the Recraft image generation models. Offers raster and SVG vector image generation with per-image API pricing.

OpenAI Compat Vision

Runway

Creator of Gen-4.5 and Aleph video generation models. Credit-based API at $0.01/credit. Also offers image generation and audio models.

Sakana AI

Tokyo-based AI lab (founded by Transformer co-author Llion Jones and David Ha). Its Fugu model is a multi-agent orchestration system sold as a single model — a trained conductor that routes and synthesizes across a swappable pool of frontier models, rather than a model Sakana trains itself. Offered via subscription plans and per-token through OpenRouter.

OpenAI Compat Streaming Functions

Sambanova

AI inference on custom SN50 chips. OpenAI-compatible API with fast output speeds. GPT OSS 120B at 600+ tok/s.

OpenAI Compat Streaming Functions

Venice AI

Privacy-focused AI inference with no logging. Supports open-source and proprietary models including Claude, GPT, Grok, and open-weight LLMs.

OpenAI Compat Streaming Embeddings Functions

Voyage AI

Creator of the Voyage embedding and reranking models. Specializes in high-quality text embeddings with domain-specific models for code, law, and finance.

200M free tokens/month for current models free

OpenAI Compat Embeddings Free Tier

WaveSpeed AI

Fast AI inference platform specializing in image and video generation with serverless GPU infrastructure.

OpenAI Compat Streaming Vision

Zhipu AI

Creator of the GLM model family. Offers LLMs, vision models, image generation (CogView), and video generation (CogVideoX) via an OpenAI-compatible API at z.ai.

GLM-4.7-Flash and GLM-4.5-Flash are free free

OpenAI Compat Streaming Functions Vision Free Tier

xAI

Creator of the Grok model family. Offers LLMs with reasoning modes, image generation, and video generation. Batch API available at 50% discount.

OpenAI Compat Streaming Functions Vision