DE

DeepInfra

Serverless Featured

Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.

Features

OpenAI Compatible
Streaming
Batching
Fine-tuning
Embeddings
Vision
Audio
Function Calling
JSON Mode

Compliance

SOC 2
HIPAA
GDPR

Pricing Model

per token
Free tier: Available

Details

Models 13
API Base https://api.deepinfra.com/v1/openai
Audio / MusicImage GenerationLLM Inference

Model Catalog (13)

Model Type Input $/1M Output $/1M Context Speed Status
DeepSeek V3.2
DeepSeek · 671B MoE
llm $0.260 $0.380
GLM 4.7
Zhipu AI
llm $0.060 $0.400
GLM 5
Zhipu AI · 744B
llm $0.800 $2.56
Kimi K2.5
Moonshot AI
llm $0.450 $2.25
MiniMax M2.5
MiniMax
llm $0.270 $0.950
Qwen 3 Max
Alibaba
llm $1.20 $6.00
Qwen 3 Max Thinking
Alibaba
llm $1.20 $6.00
Qwen 3 TTS
Alibaba
text to_speech $0.00002/character see notes
Qwen 3.5 122B
Alibaba · 122B MoE (10B active)
llm $0.290 $2.90
Qwen 3.5 35B
Alibaba · 35B MoE (3B active)
llm $0.220 $2.20
Qwen 3.5 397B
Alibaba · 397B MoE (17B active)
llm $0.540 $3.40
Qwen 3.5 72B
Alibaba · 72B
llm $0.260 $2.60
Qwen 3.5 9B
Alibaba · 9B
llm $0.040 $0.200