CE

Cerebras

Serverless Featured

Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.

Features

OpenAI Compatible
Streaming
Batching
Fine-tuning
Embeddings
Vision
Audio
Function Calling
JSON Mode

Compliance

SOC 2
HIPAA
GDPR

Pricing Model

per token
Free tier: Free tier with all models, no credit card required

Details

Models 3
API Base https://api.cerebras.ai/v1
LLM Inference

Model Catalog (3)

Model Type Input $/1M Output $/1M Context Speed Status
GLM 4.7
Zhipu AI
llm $2.25 $2.75
GPT OSS 120B
OpenAI · 117B MoE (5.1B active)
llm $0.350 $0.750
Qwen 3 235B
Alibaba · 235B MoE
llm $0.600 $1.20