Cerebras

Serverless Featured

Ultra-fast AI inference on custom Wafer-Scale Engine chips. Up to 3000 tok/s output speed, 20x faster than GPU-based providers.

OpenAI Compatible

Streaming

Batching

Fine-tuning

Embeddings

Vision

Audio

Function Calling

JSON Mode

SOC 2

HIPAA

GDPR

per token

Free tier: Free tier with all models, no credit card required

Models 3

API Base https://api.cerebras.ai/v1

LLM Inference

Model Catalog (3)

Model	Type	Input $/1M	Output $/1M	Context	Speed
GLM 4.7 Zhipu AI	llm	$2.25	$2.75	—	—
GPT OSS 120B OpenAI · 117B MoE (5.1B active)	llm	$0.350	$0.750	—	—
Qwen 3 235B Alibaba · 235B MoE	llm	$0.600	$1.20	—	—