GLM-5.1 Released: Z.ai's Coding-First Frontier Model Now Available via API
Z.ai (formerly Zhipu AI) releases GLM-5.1, a frontier coding model scoring 94% of Claude Opus 4.6 on coding benchmarks. Here's what's new and where to access it.
Z.ai (formerly Zhipu AI) has released GLM-5.1 — a major upgrade to their GLM-5 model family, with a heavy focus on coding and long-horizon autonomous tasks. The model launched on April 7, 2026 and is already available through API providers.
What’s new in GLM-5.1
GLM-5.1 builds on the GLM-5 architecture (744 billion total parameters, 40 billion active per inference) with significant improvements in coding performance:
- Coding benchmark score of 45.3 using Claude Code as the testing harness — just 2.6 points behind Claude Opus 4.6’s 47.9 (94.6% of Opus performance)
- 28% improvement over GLM-5’s score of 35.4
- #1 on SWE-Bench Pro among open-source models with a 58.4 score
- 8-hour autonomous task execution — the model can plan, execute, and self-correct on a single task for extended periods
- 200K context window with up to 202,752 max output tokens
- Extended reasoning support via configurable
<think>tags
The entire GLM-5 family was trained exclusively on Huawei Ascend 910B accelerators — one of the most notable demonstrations of training frontier models without Nvidia hardware.
Arena.ai leaderboard results
GLM-5.1 is already making waves on the Arena.ai leaderboard, which ranks models based on blind human preference votes:
- #10 in Coding with an Arena score of 1520 — sitting alongside Claude Sonnet 4.6 (1522) and GPT-5.2 (1520), and ahead of Gemini 3 Pro (1519). This puts it in the top tier of coding models, and it’s the highest-ranked open-weight model on the coding leaderboard.
- #14 Overall with an Arena score of 1467, placing it above GLM-5 (#23, score 1456) and GLM-4.7 (#39, score 1443).
For context, the coding leaderboard is led by Claude Opus 4.6 Thinking (1555), Claude Opus 4.6 (1546), and GPT-5.4 High (1532). GLM-5.1 closing in on that range at $1.40/$4.40 vs $5/$25 is notable.
These are community-driven results from nearly 4,000 votes, not self-reported benchmarks — which adds significant credibility to Z.ai’s performance claims.
Design Arena results
GLM-5.1 also performs well on the Design Arena leaderboard, which evaluates AI models on frontend code generation quality through blind human votes:
- #4 in Code Categories with an Elo score of 1348 — behind only Claude Opus 4.6 (1359), Claude Opus 4.6 Thinking (1355), and GLM 5 Turbo (1355). It beats Claude Sonnet 4.6 (1339) in design-oriented code generation.
This is a strong showing for a coding-focused model — it suggests GLM-5.1 doesn’t just write functional code, but produces quality frontend output that humans prefer over most competitors.
Running GLM-5.1 locally
If you’d rather self-host, Unsloth AI has released GGUF quantizations that make it possible to run GLM-5.1 on consumer hardware. Their Dynamic 2-bit quantization shrinks the full 744B model from 1.65TB down to 220GB — an 86% reduction. It runs on a 256GB Mac or equivalent RAM/VRAM setups.
API pricing and providers
GLM-5.1 is already available through 9 providers on OpenRouter, all competitively priced compared to other frontier coding models:
| Provider | Input/1M | Output/1M |
|---|---|---|
| AtlasCloud | $1.00 | $3.20 |
| NovitaAI | $1.40 | $4.40 |
| DeepInfra | $1.40 | $4.40 |
| Parasail | $1.40 | $4.40 |
| Fireworks | $1.40 | $4.40 |
| io.net | $1.40 | $4.40 |
| Z.ai (direct) | $1.40 | $4.40 |
| Friendli | $1.40 | $4.40 |
| Venice | $1.75 | $5.50 |
AtlasCloud is the cheapest at $1.00/$3.20. Most other providers cluster at $1.40/$4.40. For comparison, Claude Opus 4.6 costs $5.00/$25.00 at Anthropic’s direct pricing — making GLM-5.1 80–87% cheaper while reaching 94% of Opus coding performance.
Z.ai also offers a direct GLM Coding Plan starting at $3/month (promotional) or $10/month (standard) for access through their platform.
To use GLM-5.1 via OpenRouter, the model ID is z-ai/glm-5.1 — compatible with any OpenAI-format SDK.
Should you try it?
The numbers speak for themselves. GLM-5.1 ranks #10 on Arena.ai’s coding leaderboard and #4 on the Design Arena code leaderboard — both based on blind human preference votes, not self-reported benchmarks. It’s competing head-to-head with Claude Sonnet 4.6 and GPT-5.2 at a fraction of the cost.
If you’re running coding or frontend workloads, it’s worth a serious look. At $1.00–$1.40 per million input tokens, you can run extensive evals without breaking the bank. The 200K context window and 8-hour autonomous task capability also make it a strong candidate for agentic workflows.
We’d still recommend testing on your own tasks before going to production — but the gap between GLM-5.1 and the top proprietary models is narrower than the price difference suggests.
For the latest pricing and provider availability, check the GLM-5.1 model page on Inference Hub.