ApX 标志ApX 标志

趋近智

GLM-4-9B

参数

9B

上下文长度

128K

模态

Text

架构

Dense

许可证

MIT License

发布日期

30 Jun 2024

训练数据截止日期

Apr 2024

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

40

注意力头

32

键值头

2

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

GLM-4-9B

The GLM-4-9B represents a significant iteration in the General Language Model (GLM) series developed by Zhipu AI and the THUDM Laboratory at Tsinghua University. This 9-billion parameter model is engineered to provide a sophisticated balance between computational efficiency and high-level linguistic performance, supporting a multilingual corpus across 26 languages. It is designed for diverse applications, including high-throughput translation, automated content synthesis, and complex question-answering systems. The model is released with open weights under the MIT License, facilitating broad community adoption and research in the field of large-scale pre-training.

Architecturally, GLM-4-9B is built upon a dense transformer framework that incorporates several structural optimizations. It utilizes Grouped Query Attention (GQA) with 32 attention heads and 2 key-value heads to reduce memory overhead during inference while maintaining robust semantic representation. The model implements an autoregressive blank-infilling objective during its pre-training on 10 trillion tokens, which enhances its ability to handle both prefix-based generation and bidirectional understanding. To support long-context processing, it employs Rotary Position Embeddings (RoPE) and is capable of extending its context window up to 128,000 tokens through YaRN (Yet another RoPE extensioN) scaling techniques.

Technical refinements in the GLM-4-9B architecture include the use of RMSNorm for stable layer normalization and the SiLU (Sigmoid Linear Unit) activation function, often implemented within a SwiGLU-style feed-forward network. The design specifically omits bias terms in most linear layers, except for those within the Query, Key, and Value components, a choice intended to improve the model's length extrapolation capabilities. This model serves as the foundation for specialized variants, such as the GLM-4-9B-Chat for human-aligned dialogue and the GLM-4V-9B for multimodal vision-language tasks, demonstrating its versatility as a base architecture for production-grade AI systems.

关于 GLM Family

General Language Models from Z.ai


其他 GLM Family 模型

评估基准

没有可用的 GLM-4-9B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
63k
125k

所需显存:

推荐 GPU