AI Inference API Providers

Streaming Functions Vision

Official Claude API. Direct access to Claude Opus, Sonnet, and Haiku models.

6 models

Cerebras

Free tier with all models, no credit card required free

Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.

3 models

OpenAI Compat Streaming Finetuning Functions Free Tier

DeepInfra

OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier

Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.

13 models

Fireworks AI

OpenAI Compat Streaming Finetuning Embeddings Functions Vision Free Tier

Fast serverless inference for open-source models with per-token pricing, fine-tuning, and on-demand deployments.

24 models

$1 in free credits free

Google

Streaming Embeddings Functions Vision Free Tier

Official Gemini API via Google AI Studio and Vertex AI. Direct access to Gemini, Imagen, and Gemma models.

19 models

Groq

OpenAI Compat Streaming Functions Vision Free Tier

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

4 models

KIE AI

OpenAI Compat Streaming Vision

Affordable AI API aggregator offering 259+ models across chat, image, video, and music at discounted prices.

55 models

Mistral AI

OpenAI Compat Streaming Functions Vision Free Tier

Official Mistral API. Direct access to Mistral Large, Small, and Ministral models. EU data residency available.

3 models

Muapi

OpenAI Compat Streaming Vision

AI API aggregator with 315+ model endpoints across text, image, video, and audio at competitive prices.

75 models

Novita AI

Budget AI inference platform with broad model catalog across LLM, image, video, and audio. Very competitive per-token pricing.

41 models

OpenAI

OpenAI Compat Streaming Embeddings Functions Vision Free Tier

Official OpenAI API. Direct access to GPT, DALL-E, Whisper, and embedding models.

18 models

OpenRouter

OpenAI Compat Streaming Functions Vision Free Tier

Unified API for 300+ LLMs from OpenAI, Anthropic, Google, Meta, and more. Routes to the best provider automatically.

58 models

Replicate

Streaming Finetuning Vision

Run and deploy machine learning models with a cloud API. Pay-per-use with serverless GPU infrastructure.

67 models

SiliconFlow

Fast and affordable AI inference platform. 2.3x faster speeds and 32% lower latency than major cloud platforms. Supports LLM, image, video, and audio models.

20 models

Together AI

OpenAI Compat Streaming Finetuning Embeddings Vision Free Tier

Serverless and dedicated inference for open-source LLMs, image, video, and audio models. GPU clusters available.

9 models

AIMLAPI

Aggregator

Unified API for 400+ AI models across text, image, video, and audio. OpenAI-compatible with serverless inference.

25 models

OpenAI Compat Streaming Embeddings Vision Free Tier

Atlas Cloud

Aggregator

Full-modal AI inference platform with 300+ models. Smart routing to cheapest servers with transparent pay-as-you-go pricing.

10 models

Cloudflare Workers AI

OpenAI Compat Streaming Embeddings Vision Free Tier

Edge AI inference across 200+ cities worldwide. Serverless, pay-per-use with OpenAI-compatible API.

12 models

Friendli AI

Up to 50K inference credit for new users free

Fast serverless and dedicated AI inference. Korean provider with competitive pricing on open-source models and prompt caching support.

7 models

OpenAI Compat Streaming Functions Free Tier

Hyperbolic

10 dollars free credits on signup free

Open-access AI cloud with serverless inference and dedicated hosting. Zero data retention, privacy-first design. Unified per-token pricing.

7 models

Nebius

1 dollar in free credits to start free

European AI inference on Token Factory. Two flavors: fast (low latency) and base (cost-efficient). Batch at 50% off.

8 models

OpenAI Compat Streaming Finetuning Embeddings Functions Free Tier

Parasail

Free credits on signup free

Distributed AI inference network with serverless, dedicated, and batch endpoints. No rate limits, no contracts, up to 30x cheaper than legacy cloud.

13 models

Sambanova

OpenAI Compat Streaming Functions

AI inference on custom SN50 chips. OpenAI-compatible API with fast output speeds. GPT OSS 120B at 600+ tok/s.

7 models

Venice AI

OpenAI Compat Streaming Embeddings Functions

Privacy-focused AI inference with no logging. Supports open-source and proprietary models including Claude, GPT, Grok, and open-weight LLMs.

15 models

WaveSpeed AI