GR

Groq

Serverless Featured

Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.

Features

OpenAI Compatible
Streaming
Batching
Fine-tuning
Embeddings
Vision
Audio
Function Calling
JSON Mode

Compliance

SOC 2
HIPAA
GDPR

Pricing Model

per token
Free tier: Available

Details

Models 4
API Base https://api.groq.com/openai/v1
Audio / MusicLLM Inference

Model Catalog (4)

Model Type Input $/1M Output $/1M Context Speed Status
Llama 3.3 70B
Meta · 70B
llm $0.590 $0.790
Llama 4 Scout
Meta · 109B (17B active)
llm $0.110 $0.340
Qwen 3 32B
Alibaba · 32B
llm $0.290 $0.590
Whisper Large V3
OpenAI · 1.5B
speech to_text $0.0019/req