Fastest LLM inference powered by custom LPU chips. OpenAI-compatible API with sub-second latency.
Features
OpenAI Compatible
Streaming
Batching
Fine-tuning
Embeddings
Vision
Audio
Function Calling
JSON Mode
Compliance
SOC 2
HIPAA
GDPR
Pricing Model
per token Free tier: Available
Details
Models 4
API Base
https://api.groq.com/openai/v1 Audio / MusicLLM Inference
Model Catalog (4)
| Model | Type | Input $/1M | Output $/1M | Context | Speed | Status |
|---|---|---|---|---|---|---|
| Llama 3.3 70B Meta · 70B | llm | $0.590 | $0.790 | — | — | |
| Llama 4 Scout Meta · 109B (17B active) | llm | $0.110 | $0.340 | — | — | |
| Qwen 3 32B Alibaba · 32B | llm | $0.290 | $0.590 | — | — | |
| Whisper Large V3 OpenAI · 1.5B | speech to_text | $0.0019/req | — | — | — |