Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.
Features
OpenAI Compatible
Streaming
Batching
Fine-tuning
Embeddings
Vision
Audio
Function Calling
JSON Mode
Compliance
SOC 2
HIPAA
GDPR
Pricing Model
per token Free tier: Free tier with all models, no credit card required
Details
Models 3
API Base
https://api.cerebras.ai/v1 LLM Inference
Model Catalog (3)
| Model | Type | Input $/1M | Output $/1M | Context | Speed | Status |
|---|---|---|---|---|---|---|
| GLM 4.7 Zhipu AI | llm | $2.25 | $2.75 | — | — | |
| GPT OSS 120B OpenAI · 117B MoE (5.1B active) | llm | $0.350 | $0.750 | — | — | |
| Qwen 3 235B Alibaba · 235B MoE | llm | $0.600 | $1.20 | — | — |