趋近智
参数
6B
上下文长度
32.768K
模态
Text
架构
Dense
许可证
ChatGLM3-6B Model License
发布日期
27 Oct 2023
训练数据截止日期
-
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
28
注意力头
32
键值头
2
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
ChatGLM3-6B-32K is an advanced large language model optimized for long-context understanding and generation. Developed through a collaboration between Zhipu AI and Tsinghua University's KEG Lab, this model serves as a specialized variant of the ChatGLM3-6B architecture, specifically engineered to extend the effective context window to 32,768 tokens. This expansion allows for the processing of comprehensive documents, long-form dialogues, and complex technical texts that exceed the limits of standard transformer-based models.
The model's architecture is built upon a 28-layer dense transformer framework. It incorporates several technical refinements to maintain stability and performance across its extended context, including the use of RMSNorm for normalization and Multi-Query Attention (MQA) to optimize inference efficiency. A significant innovation in this variant is the updated Rotary Position Embedding (RoPE) mechanism, which utilizes a modified base frequency (rope_ratio) to ensure precise positional resolution over 32K tokens. Furthermore, the model is trained with a specialized methodology that emphasizes long-text coherence during the conversation stage.
Designed for technical versatility, ChatGLM3-6B-32K natively supports tool invocation through function calling, code execution via an integrated code interpreter, and complex agent-based tasks. These features make it highly suitable for building sophisticated AI agents capable of deep text analysis and multi-step reasoning. The model's weights are open for academic research and available for free commercial use following a formal registration process, reflecting a commitment to accessible high-performance natural language processing.
ChatGLM series models from Z.ai, based on GLM architecture.
没有可用的 ChatGLM3-6B-32K 评估基准。