Models
Explore models and compare pricing across providers.
Aura 2
DeepgramLow-latency TTS at 90ms optimized TTFB for voice agent production use.
Cloud TTS
GoogleGoogle's text-to-speech service supporting 75+ languages with WaveNet and Neural2 voices.
Fish Speech S2
Fish AudioMultilingual TTS supporting 80+ languages with voice cloning capabilities.
Flash V2.5
ElevenLabsUltra-low latency TTS at 75ms TTFA. Best for real-time conversational voice agents.
Multilingual V2
ElevenLabsUltra-realistic narration in 70+ languages with thousands of voice presets.
Qwen 3 TTS
AlibabaQwen 3 text-to-speech model with voice cloning support.
Sonic
CartesiaFastest production TTS at ~40ms TTFA. 15 languages, ~130 voices. One fifth the cost of ElevenLabs.
Speech 2.6
MiniMaxMiniMax text-to-speech with HD and turbo variants. Supports voice cloning.
TTS-1
OpenAIOpenAI's fast text-to-speech model optimized for real-time use.
TTS-1 HD
OpenAIHigher quality TTS with improved naturalness and pronunciation accuracy.
Voxtral TTS
Mistral AIOpen-weight 4B TTS model. 9 languages, ~90ms TTFA, voice cloning from 3s reference. CC BY NC 4.0.