Groq

Serverless Featured

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

OpenAI Compatible

Streaming

Batching

Fine-tuning

Embeddings

Vision

Audio

Function Calling

JSON Mode

SOC 2

HIPAA

GDPR

per token

Free tier: Available

Models 4

API Base https://api.groq.com/openai/v1

Audio / MusicLLM Inference

Model Catalog (4)

Model	Type	Input $/1M	Output $/1M	Context	Speed
Llama 3.3 70B Meta · 70B	llm	$0.590	$0.790	—	—
Llama 4 Scout Meta · 109B (17B active)	llm	$0.110	$0.340	—	—
Qwen 3 32B Alibaba · 32B	llm	$0.290	$0.590	—	—
Whisper Large V3 OpenAI · 1.5B	speech to_text	$0.0019/req	—	—	—