GLM-5.1

Name: GLM-5.1
Author: Zhipu AI

MIT

Zhipu AI · 754B · transformer-moe

🤗 HuggingFace Ollama Official

2026-04-07 203K context 754B params

Use Cases

chat code reasoning multilingual tools math

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q2_Krec	2	305.0 GB	Moderate	—
Q4_K_M	4	450.0 GB	Good	—
Q8_0	8	820.0 GB	Excellent	—

About this model

GLM-5.1 is Zhipu AI's next-generation flagship model for agentic engineering, succeeding GLM-5 with significantly stronger coding and long-horizon task capabilities. A 754B parameter Mixture-of-Experts with 40B active parameters per token, it achieves state-of-the-art performance on SWE-Bench Pro (58.4), outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Designed for sustained autonomous work, GLM-5.1 can operate on a single task for up to 8 hours — planning, executing, testing, and iterating across hundreds of rounds and thousands of tool calls. Its MoE architecture keeps VRAM requirements manageable despite the massive parameter count, making it accessible on high-end consumer hardware at Q4_K_M quantization.

Benchmarks

58.4

swe-bench-pro

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run glm-5.1:latest

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 754B
Architecture: transformer-moe
Context: 203K tokens
Min VRAM: 305.0 GB
Recommended: 305.0 GB
Family: GLM
Released: 2026-04-07
License: MIT