Alibaba's Qwen Model Family Explained: Every Model Line From LLMs to Video Generation
A complete guide to Alibaba's Qwen ecosystem — Qwen 3/3.5/3.6 LLMs, QwQ reasoning, Qwen-VL vision, Qwen-Omni multimodal, Qwen-Coder, Wan video generation, and more. What each model does and when to use it.
Alibaba’s Qwen has quietly become one of the largest AI model ecosystems in the world. What started as a single LLM in 2023 now spans text, vision, audio, code, reasoning, image generation, video generation, text-to-speech, and real-time multimodal interaction — with most models released as open-weight under Apache 2.0.
Here’s a breakdown of every model line and when you’d actually use each one.
The LLM core: Qwen 3, 3.5, and 3.6
Alibaba maintains three active generations of text models, each targeting different needs.
Qwen 3 (April 2025)
The foundation. Qwen 3 shipped as both dense and MoE (mixture-of-experts) models, trained on 36 trillion tokens across 119 languages.
| Model | Parameters | Context | Open Weight |
|---|---|---|---|
| Qwen 3 235B | 235B MoE (22B active) | 128K | Yes |
| Qwen 3 32B | 32B | 128K | Yes |
| Qwen 3 8B | 8B | 128K | Yes |
| Qwen 3 Max | Undisclosed | 262K | No |
Qwen 3 Max is the proprietary flagship — Alibaba’s strongest closed model in the Qwen 3 generation. Qwen 3 Max Thinking adds extended reasoning (similar to OpenAI’s o1-style thinking) for complex multi-step problems.
When to use Qwen 3: If you need a battle-tested open-weight model with broad third-party provider availability. Qwen 3 235B and 32B are available on 10+ inference providers.
Qwen 3.5 (February 2026)
The multimodal upgrade. Qwen 3.5 models are “native multimodal agents” — designed from the ground up to handle text, vision, and tool use together.
| Model | Parameters | Context | Open Weight | Vision |
|---|---|---|---|---|
| Qwen 3.5 397B | 397B MoE (17B active) | 128K | Yes | No |
| Qwen 3.5 122B | 122B MoE (10B active) | 128K | Yes | No |
| Qwen 3.5 72B | 72B | 128K | Yes | Yes |
| Qwen 3.5 35B | 35B MoE (3B active) | 128K | Yes | No |
| Qwen 3.5 9B | 9B | 128K | Yes | No |
When to use Qwen 3.5: For production workloads that need the latest open-weight performance. The 397B is competitive with frontier closed models on most benchmarks. The 72B is notable for having native vision support.
Qwen 3.6 Plus (April 2026)
The latest proprietary flagship. Qwen 3.6 Plus is Alibaba’s frontier model with 1M token context, multimodal reasoning, and native computer-use capabilities — purpose-built for agentic coding and complex multi-step tasks.
When to use Qwen 3.6 Plus: When you need Alibaba’s absolute best. It’s competitive with Claude, GPT, and Gemini on coding and agentic tasks. Limited third-party availability — primarily on Alibaba Cloud and Fireworks AI.
Reasoning: QwQ
QwQ is Alibaba’s reasoning-focused model line, comparable to OpenAI’s o1/o3 series.
- QwQ-Plus: Extended thinking for math, logic, and complex reasoning. 131K context.
- QwQ-32B: The open-weight version (32B parameters). Apache 2.0 licensed.
When to use QwQ: Math-heavy workloads, formal logic, multi-step problem solving, or any task where “thinking longer” produces better results.
Vision: Qwen-VL
Qwen-VL models process images and documents alongside text — visual question answering, chart analysis, OCR, and more.
| Model | Context | Use Case |
|---|---|---|
| Qwen3-VL-Plus | 262K | Best visual reasoning |
| Qwen3-VL-Flash | 262K | Fast visual tasks |
| Qwen-VL-Max | 131K | Advanced visual understanding |
| Qwen-VL-Plus | 131K | General vision tasks |
| Qwen-VL-OCR | 38K | Document text extraction |
When to use Qwen-VL: Document processing, image understanding, chart/graph analysis. The OCR variant is particularly cost-effective for pure text extraction at $0.04-0.07/1M tokens.
Multimodal: Qwen-Omni
The Omni line handles everything — text, images, audio, and video as input, with text and speech as output. This is Alibaba’s answer to GPT-4o’s multimodal capabilities.
| Model | Capabilities | Notes |
|---|---|---|
| Qwen3.5-Omni-Plus | Text, image, audio, video in; text + speech out | Latest, currently in free preview |
| Qwen3-Omni-Flash | Same as above with thinking mode | Production-ready |
| Qwen-Omni-Turbo | Lightweight multimodal | Budget option |
| Qwen3-Omni-Flash-Realtime | Streaming audio with VAD | For voice assistants |
| Qwen-Omni-Turbo-Realtime | Lightweight realtime | Low-latency voice apps |
When to use Qwen-Omni: Voice assistants, video understanding, any application that mixes modalities. The Realtime variants support streaming audio input with voice activity detection — ideal for conversational AI.
Code: Qwen-Coder
Dedicated coding models optimized for code generation, debugging, and agentic development workflows.
| Model | Context | Use Case |
|---|---|---|
| Qwen3-Coder-Plus | 1M | Complex coding agents with tool calling |
| Qwen3-Coder-Flash | 1M | Fast code generation |
Both models support 1M token context — enough to ingest entire codebases. They’re designed for agentic workflows where the model plans, writes, tests, and iterates on code autonomously.
When to use Qwen-Coder: IDE integrations, code agents, automated refactoring. The 1M context means you can feed in large repos without chunking.
Image generation: Qwen Image and Wan
Alibaba has two image generation families:
Qwen Image — The proprietary line with strong Chinese and English text rendering:
- Qwen Image 2.0 Pro ($0.075/image) — highest quality
- Qwen Image 2.0 ($0.03/image) — standard quality
- Z-Image ($0.004/image) — ultra-cheap, bilingual text rendering
Wan — The open-source line from Alibaba’s Wan team:
- Wan 2.6 Text-to-Image ($0.03/image) — open-weight image generation
Z-Image deserves special mention — at $0.004/image it’s one of the cheapest image generation APIs available anywhere, and it handles CJK text rendering better than most Western models.
When to use these: Z-Image for high-volume, cost-sensitive image generation. Qwen Image 2.0 Pro for quality. Wan for self-hosting.
Video generation: Wan
Wan is Alibaba’s video generation family — and it’s one of the strongest open-source video models available.
| Model | Type | Notes |
|---|---|---|
| Wan 2.6 Text-to-Video | Text to video | 720p/1080p output |
| First-Frame-to-Video | Image to video | Single image as starting frame |
| First-and-Last-Frame-to-Video | Image to video | Two reference frames for guided generation |
| Multi-Image-to-Video | Images to video | Multiple input reference images |
| Reference-to-Video | Performance synthesis | Character animation |
| AnimateAnyone | Dance generation | Motion transfer |
| Wan Digital Human | Talking head | Image + audio to lip-synced video |
Alibaba also offers specialized video tools — VideoRetalk for lip-sync replacement, EMO for facial expression synthesis, and LivePortrait for voice announcement videos.
When to use Wan: Open-source video generation where you want control. The API pricing ($0.10-0.15/sec) is competitive with Runway and Luma. The variety of input modes (text, image, multi-image, first+last frame) gives more creative control than most competitors.
Speech: Qwen TTS and ASR
| Model | Type | Use Case |
|---|---|---|
| Qwen Speech Synthesis | TTS | Standard text-to-speech |
| Qwen Real-Time Speech Synthesis | TTS | Low-latency streaming |
| Qwen Real-Time Speech Recognition | ASR | Streaming transcription |
| Qwen Audio File Recognition | ASR | Batch file transcription |
| Qwen3-LiveTranslate-Flash-Realtime | Translation | Real-time speech translation |
When to use these: Voice interfaces, transcription pipelines, real-time translation. The LiveTranslate model is unique — real-time speech-to-speech translation is still rare in the API market.
Embeddings
| Model | Parameters | Highlights |
|---|---|---|
| Qwen3 Embedding 8B | 8B | #1 on MTEB multilingual leaderboard |
| Qwen3 Embedding 4B | 4B | Balanced performance |
| Qwen3 Embedding 0.6B | 0.6B | Lightweight for edge/mobile |
All three are open-weight (Apache 2.0) and priced at $0.007/1M tokens on Alibaba Cloud — significantly cheaper than OpenAI’s embedding models.
When to use these: RAG pipelines, semantic search, multilingual applications. The 8B model is best-in-class for multilingual embeddings.
How it all fits together
Here’s a quick decision tree:
- General text tasks → Qwen 3.6 Plus (best) or Qwen 3.5 397B (best open-weight)
- Budget text tasks → Qwen-Flash ($0.05/1M input)
- Complex reasoning → QwQ-Plus or Qwen 3 Max Thinking
- Code generation → Qwen3-Coder-Plus (1M context)
- Image understanding → Qwen3-VL-Plus or VL-OCR for documents
- Image generation → Z-Image (cheap) or Qwen Image 2.0 Pro (quality)
- Video generation → Wan 2.6
- Voice/audio → Qwen-Omni for understanding, Qwen TTS for synthesis
- Real-time voice → Qwen3-Omni-Flash-Realtime
- Embeddings → Qwen3 Embedding 8B
- Everything at once → Qwen3.5-Omni-Plus
Open weight vs. proprietary
Most of Alibaba’s lineup is Apache 2.0 open-weight — you can self-host, fine-tune, and commercially deploy. The proprietary exceptions are:
- Qwen 3 Max / Max Thinking
- Qwen 3.6 Plus
- QwQ-Plus (the open-weight QwQ-32B is available though)
- Qwen-VL-Max
- Qwen Image Pro models
For everything else — including the 397B parameter Qwen 3.5 — you can download weights from HuggingFace and run them yourself.
For pricing details and provider comparisons, see our Alibaba Cloud Qwen API Pricing breakdown or browse all Qwen models on Inference Hub.