GLM-4.5-Air：规格和 GPU 显存要求

GLM-4.5-Air

开源

开放权重

活跃参数

106B

上下文长度

128K

模态

Multimodal

架构

Mixture of Experts (MoE)

许可证

MIT License

发布日期

28 Jul 2025

训练数据截止日期

Mar 2025

技术规格

专家参数总数

12.0B

专家数量

129

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

注意力头

键值头

激活函数

Swish

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

GLM-4.5-Air

GLM-4.5-Air is a high-efficiency large language model developed by Z.ai as part of the GLM-4.5 series. It is designed to bridge the gap between massive-scale foundation models and the practical constraints of on-device or mid-range cloud deployments. Optimized primarily for agent-oriented workflows, the model prioritizes reasoning, complex instruction following, and code generation. It functions as a versatile engine for autonomous agents capable of multi-step planning and tool invocation, making it a viable selection for developers building sophisticated digital assistants and automated software engineering pipelines.

Architecturally, the model utilizes a sparse Mixture-of-Experts (MoE) framework, featuring 106 billion total parameters with only 12 billion active per forward pass. This design incorporates 128 routed experts and a specialized shared expert layer, activating 9 experts per token to maintain representational capacity while significantly reducing computational overhead. The transformer block is further enhanced by a Multi-Token Prediction (MTP) layer, which allows the model to predict several future tokens simultaneously. This implementation facilitates speculative decoding, which increases inference throughput and provides a responsive experience for real-time interactive applications.

Technical innovations in GLM-4.5-Air include the adoption of Grouped-Query Attention (GQA) with 96 attention heads and 8 key-value groups, reducing memory bandwidth requirements during long-context processing. The model supports a 128,000-token context window using Rotary Positional Embeddings (RoPE) and features a hybrid reasoning system. This system allows for a deliberate thinking mode, which executes a latent chain-of-thought process for analytical problem-solving, and a standard mode for immediate output. Native integration for function calling, web browsing, and code execution ensures the model can interact with external environments with high reliability.

关于 GLM Family

General Language Models from Z.ai

其他 GLM Family 模型

评估基准

排名

#22

基准	分数	排名
Professional Knowledge MMLU Pro	0.81	11
Web Development WebDev Arena	1371	25

排名

#22

编程排名

#35

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档下载权重源代码

GLM-4.5-Air

技术规格

GLM-4.5-Air

关于 GLM Family

其他 GLM Family 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源