Serverless inference for open-source LLMs and generative models. Pay-per-token with fast cold starts.
Features
OpenAI Compatible
Streaming
Batching
Fine-tuning
Embeddings
Vision
Audio
Function Calling
JSON Mode
Compliance
SOC 2
HIPAA
GDPR
Pricing Model
per token Free tier: Available
Details
Models 13
API Base
https://api.deepinfra.com/v1/openai Audio / MusicImage GenerationLLM Inference
Model Catalog (13)
| Model | Type | Input $/1M | Output $/1M | Context | Speed | Status |
|---|---|---|---|---|---|---|
| DeepSeek V3.2 DeepSeek · 671B MoE | llm | $0.260 | $0.380 | — | — | |
| GLM 4.7 Zhipu AI | llm | $0.060 | $0.400 | — | — | |
| GLM 5 Zhipu AI · 744B | llm | $0.800 | $2.56 | — | — | |
| Kimi K2.5 Moonshot AI | llm | $0.450 | $2.25 | — | — | |
| MiniMax M2.5 MiniMax | llm | $0.270 | $0.950 | — | — | |
| Qwen 3 Max Alibaba | llm | $1.20 | $6.00 | — | — | |
| Qwen 3 Max Thinking Alibaba | llm | $1.20 | $6.00 | — | — | |
| Qwen 3 TTS Alibaba | text to_speech | $0.00002/character | see notes | — | — | |
| Qwen 3.5 122B Alibaba · 122B MoE (10B active) | llm | $0.290 | $2.90 | — | — | |
| Qwen 3.5 35B Alibaba · 35B MoE (3B active) | llm | $0.220 | $2.20 | — | — | |
| Qwen 3.5 397B Alibaba · 397B MoE (17B active) | llm | $0.540 | $3.40 | — | — | |
| Qwen 3.5 72B Alibaba · 72B | llm | $0.260 | $2.60 | — | — | |
| Qwen 3.5 9B Alibaba · 9B | llm | $0.040 | $0.200 | — | — |