GLM-4.5: Specifications and GPU VRAM Requirements

GLM-4.5

开源

开放权重

活跃参数

355B

上下文长度

128K

模态

Multimodal

架构

Mixture of Experts (MoE)

许可证

MIT License

发布日期

28 Jul 2025

训练数据截止日期

技术规格

专家参数总数

32.0B

专家数量

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

GLM-4.5

The GLM-4.5 model, developed by Z.ai (formerly Zhipu AI), represents their latest flagship hybrid reasoning model, designed to unify reasoning, coding, and agentic capabilities within a single architecture. This model is specifically optimized for agent-oriented applications, providing advanced functionalities for complex problem-solving. It is offered alongside a lighter variant, GLM-4.5-Air, which is optimized for efficiency while retaining core capabilities.

Architecturally, GLM-4.5 leverages a Mixture-of-Experts (MoE) design. It features a total of 355 billion parameters, with 32 billion active parameters utilized during a forward pass, aiming for higher parameter efficiency compared to other models. The model supports a dual reasoning approach, incorporating a "Thinking Mode" for intricate reasoning, multi-step planning, and tool usage, and a "Non-Thinking Mode" for rapid, instantaneous responses. This hybrid approach allows for flexibility in deployment, accommodating both deep analytical tasks and low-latency interactive scenarios.

GLM-4.5 is engineered for robust performance in domains such as tool invocation, web browsing, and software engineering, including both frontend and backend development. It supports native function calling and can be integrated into code-centric agents. The training regimen for GLM-4.5 involved an initial pretraining phase on 15 trillion tokens of general-domain data, followed by fine-tuning on an additional 7 trillion tokens focused on code and reasoning datasets. Reinforcement learning, specifically using Z.ai's custom-built 'slime' engine, was applied to further enhance its reasoning, coding, and agentic capabilities. The model is designed to handle extended conversational contexts, supporting a context length of 128,000 tokens and a maximum output token limit of 96,000 tokens.

关于 GLM Family

General Language Models from Z.ai

其他 GLM Family 模型

评估基准

排名适用于本地LLM。

排名

基准	分数	排名
Mathematics LiveBench Mathematics	0.92	🥇 1
LiveBench Average LiveBench Average	0.62	🥇 1
Web Development WebDev Arena	1378.1	🥉 3
Reasoning LiveBench Reasoning	0.70	⭐ 4
Agentic Coding LiveBench Agentic	0.23	5
Data Analysis LiveBench Data Analysis	0.66	8
Professional Knowledge MMLU Pro	0.85	8
Coding LiveBench Coding	0.60	11

排名

编程排名

#15

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档下载权重源代码

GLM-4.5

技术规格

系统要求

GLM-4.5

关于 GLM Family

其他 GLM Family 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源