GLM-5.1 Released: Z.ai's Coding-First Frontier Model Now Available via API

Z.ai (formerly Zhipu AI) has released GLM-5.1 — a major upgrade to their GLM-5 model family, with a heavy focus on coding and long-horizon autonomous tasks. The model launched on April 7, 2026 and is already available through API providers.

What’s new in GLM-5.1

GLM-5.1 builds on the GLM-5 architecture (744 billion total parameters, 40 billion active per inference) with significant improvements in coding performance:

Coding benchmark score of 45.3 using Claude Code as the testing harness — just 2.6 points behind Claude Opus 4.6’s 47.9 (94.6% of Opus performance)
28% improvement over GLM-5’s score of 35.4
#1 on SWE-Bench Pro among open-source models with a 58.4 score
8-hour autonomous task execution — the model can plan, execute, and self-correct on a single task for extended periods
200K context window with up to 202,752 max output tokens
Extended reasoning support via configurable <think> tags

The entire GLM-5 family was trained exclusively on Huawei Ascend 910B accelerators — one of the most notable demonstrations of training frontier models without Nvidia hardware.

Arena.ai leaderboard results

GLM-5.1 is already making waves on the Arena.ai leaderboard, which ranks models based on blind human preference votes:

#10 in Coding with an Arena score of 1520 — sitting alongside Claude Sonnet 4.6 (1522) and GPT-5.2 (1520), and ahead of Gemini 3 Pro (1519). This puts it in the top tier of coding models, and it’s the highest-ranked open-weight model on the coding leaderboard.
#14 Overall with an Arena score of 1467, placing it above GLM-5 (#23, score 1456) and GLM-4.7 (#39, score 1443).

For context, the coding leaderboard is led by Claude Opus 4.6 Thinking (1555), Claude Opus 4.6 (1546), and GPT-5.4 High (1532). GLM-5.1 closing in on that range at $1.40/$4.40 vs $5/$25 is notable.

These are community-driven results from nearly 4,000 votes, not self-reported benchmarks — which adds significant credibility to Z.ai’s performance claims.

Design Arena results

GLM-5.1 also performs well on the Design Arena leaderboard, which evaluates AI models on frontend code generation quality through blind human votes:

#4 in Code Categories with an Elo score of 1348 — behind only Claude Opus 4.6 (1359), Claude Opus 4.6 Thinking (1355), and GLM 5 Turbo (1355). It beats Claude Sonnet 4.6 (1339) in design-oriented code generation.

This is a strong showing for a coding-focused model — it suggests GLM-5.1 doesn’t just write functional code, but produces quality frontend output that humans prefer over most competitors.

Running GLM-5.1 locally

If you’d rather self-host, Unsloth AI has released GGUF quantizations that make it possible to run GLM-5.1 on consumer hardware. Their Dynamic 2-bit quantization shrinks the full 744B model from 1.65TB down to 220GB — an 86% reduction. It runs on a 256GB Mac or equivalent RAM/VRAM setups.

API pricing and providers

GLM-5.1 is already available through 9 providers on OpenRouter, all competitively priced compared to other frontier coding models:

Provider	Input/1M	Output/1M
AtlasCloud	$1.00	$3.20
NovitaAI	$1.40	$4.40
DeepInfra	$1.40	$4.40
Parasail	$1.40	$4.40
Fireworks	$1.40	$4.40
io.net	$1.40	$4.40
Z.ai (direct)	$1.40	$4.40
Friendli	$1.40	$4.40
Venice	$1.75	$5.50

AtlasCloud is the cheapest at $1.00/$3.20. Most other providers cluster at $1.40/$4.40. For comparison, Claude Opus 4.6 costs $5.00/$25.00 at Anthropic’s direct pricing — making GLM-5.1 80–87% cheaper while reaching 94% of Opus coding performance.

Z.ai also offers a direct GLM Coding Plan starting at $3/month (promotional) or $10/month (standard) for access through their platform.

To use GLM-5.1 via OpenRouter, the model ID is z-ai/glm-5.1 — compatible with any OpenAI-format SDK.

Should you try it?

The numbers speak for themselves. GLM-5.1 ranks #10 on Arena.ai’s coding leaderboard and #4 on the Design Arena code leaderboard — both based on blind human preference votes, not self-reported benchmarks. It’s competing head-to-head with Claude Sonnet 4.6 and GPT-5.2 at a fraction of the cost.

If you’re running coding or frontend workloads, it’s worth a serious look. At $1.00–$1.40 per million input tokens, you can run extensive evals without breaking the bank. The 200K context window and 8-hour autonomous task capability also make it a strong candidate for agentic workflows.

We’d still recommend testing on your own tasks before going to production — but the gap between GLM-5.1 and the top proprietary models is narrower than the price difference suggests.

For the latest pricing and provider availability, check the GLM-5.1 model page on Inference Hub.