趋近智
参数
9B
上下文长度
128K
模态
Text
架构
Dense
许可证
MIT License
发布日期
30 Jun 2024
训练数据截止日期
Apr 2024
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
40
注意力头
32
键值头
2
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
The GLM-4-9B represents a significant iteration in the General Language Model (GLM) series developed by Zhipu AI and the THUDM Laboratory at Tsinghua University. This 9-billion parameter model is engineered to provide a sophisticated balance between computational efficiency and high-level linguistic performance, supporting a multilingual corpus across 26 languages. It is designed for diverse applications, including high-throughput translation, automated content synthesis, and complex question-answering systems. The model is released with open weights under the MIT License, facilitating broad community adoption and research in the field of large-scale pre-training.
Architecturally, GLM-4-9B is built upon a dense transformer framework that incorporates several structural optimizations. It utilizes Grouped Query Attention (GQA) with 32 attention heads and 2 key-value heads to reduce memory overhead during inference while maintaining robust semantic representation. The model implements an autoregressive blank-infilling objective during its pre-training on 10 trillion tokens, which enhances its ability to handle both prefix-based generation and bidirectional understanding. To support long-context processing, it employs Rotary Position Embeddings (RoPE) and is capable of extending its context window up to 128,000 tokens through YaRN (Yet another RoPE extensioN) scaling techniques.
Technical refinements in the GLM-4-9B architecture include the use of RMSNorm for stable layer normalization and the SiLU (Sigmoid Linear Unit) activation function, often implemented within a SwiGLU-style feed-forward network. The design specifically omits bias terms in most linear layers, except for those within the Query, Key, and Value components, a choice intended to improve the model's length extrapolation capabilities. This model serves as the foundation for specialized variants, such as the GLM-4-9B-Chat for human-aligned dialogue and the GLM-4V-9B for multimodal vision-language tasks, demonstrating its versatility as a base architecture for production-grade AI systems.
General Language Models from Z.ai
没有可用的 GLM-4-9B 评估基准。