Catalog

Models

Explore models and compare pricing across providers.

All Types Audio Gen Embedding Image Gen LLM Music Gen STT TTS Video Gen

All Creators Alibaba Amazon Anthropic AssemblyAI BAAI Black Forest Labs ByteDance Cartesia Cohere DeepSeek Deepgram ElevenLabs

Open Weight Only

Aura 2

Deepgram

TTS

Low-latency TTS at 90ms optimized TTFB for voice agent production use.

No providers yet

Cloud TTS

Google

TTS

Google's text-to-speech service supporting 75+ languages with WaveNet and Neural2 voices.

1 provider

Fish Speech S2

Fish Audio

TTS

Multilingual TTS supporting 80+ languages with voice cloning capabilities.

No providers yet

Flash V2.5

ElevenLabs

TTS

Ultra-low latency TTS at 75ms TTFA. Best for real-time conversational voice agents.

4 providers

Multilingual V2

ElevenLabs

TTS

Ultra-realistic narration in 70+ languages with thousands of voice presets.

2 providers

Qwen 3 TTS

Alibaba

TTS

Qwen 3 text-to-speech model with voice cloning support.

Open

2 providers

Sonic

Cartesia

TTS

Fastest production TTS at ~40ms TTFA. 15 languages, ~130 voices. One fifth the cost of ElevenLabs.

No providers yet

Speech 2.6

MiniMax

TTS

MiniMax text-to-speech with HD and turbo variants. Supports voice cloning.

4 providers

TTS-1

OpenAI

TTS

OpenAI's fast text-to-speech model optimized for real-time use.

1 provider

TTS-1 HD

OpenAI

TTS

Higher quality TTS with improved naturalness and pronunciation accuracy.

1 provider

Voxtral TTS

Mistral AI

TTS

Open-weight 4B TTS model. 9 languages, ~90ms TTFA, voice cloning from 3s reference. CC BY NC 4.0.

4B Open

No providers yet