ApX 标志

趋近智

GLM-4.5-Air

活跃参数

106B

上下文长度

128K

模态

Multimodal

架构

Mixture of Experts (MoE)

许可证

MIT License

发布日期

28 Jul 2025

知识截止

-

技术规格

专家参数总数

12.0B

专家数量

-

活跃专家

-

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

-

注意力头

96

键值头

-

激活函数

-

归一化

-

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

GLM-4.5-Air

The GLM-4.5-Air model, developed by Z.ai, is a member of the GLM-4.5 series, designed as a lightweight and efficient large language model. This variant is specifically optimized for on-device and smaller-scale cloud inference, aiming to deliver robust capabilities while minimizing hardware and computational requirements. It integrates core functionalities such as reasoning, coding, and agentic behaviors, making it suitable for a range of advanced AI applications.

Architecturally, GLM-4.5-Air leverages a Mixture-of-Experts (MoE) design. This allows the model to selectively activate a subset of its parameters during inference, enhancing computational efficiency compared to dense architectures. While the full GLM-4.5 model employs 355 billion total parameters with 32 billion active, GLM-4.5-Air features 106 billion total parameters with 12 billion active parameters. The model also incorporates a Multi-Token Prediction (MTP) layer to facilitate speculative decoding, which significantly boosts inference speed, potentially achieving generation rates of over 100 tokens per second.

GLM-4.5-Air supports a hybrid reasoning approach, offering both a 'thinking mode' for intricate, multi-step problem-solving and a 'non-thinking mode' for immediate, rapid responses. This dual-mode operation allows for dynamic adaptation to query complexity, optimizing resource utilization. The model is also engineered for advanced agentic applications, including native function calling, tool use, web browsing, and comprehensive software development tasks, such as full-stack web application creation.

关于 GLM Family

General Language Models from Z.ai


其他 GLM Family 模型

评估基准

排名适用于本地LLM。

排名

#7

基准分数排名

Web Development

WebDev Arena

1353.76

🥉

3

0.78

4

Agentic Coding

LiveBench Agentic

0.15

5

0.79

5

0.66

7

0.58

13

排名

排名

#7

编程排名

#13

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
63k
125k

所需显存:

推荐 GPU