趋近智
参数
6B
上下文长度
8.192K
模态
Text
架构
Dense
许可证
Apache 2.0
发布日期
27 Oct 2023
训练数据截止日期
Jul 2023
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
28
注意力头
32
键值头
2
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
ChatGLM3-6B is an advanced bilingual (Chinese-English) large language model developed through a collaboration between Zhipu AI and the Knowledge Engineering Group at Tsinghua University. As the third generation in the ChatGLM series, this model implements a refined General Language Model architecture that bridges the functional divide between autoencoding and autoregressive objectives. The pre-training phase utilizes a diverse corpus comprising approximately one trillion tokens, optimized for conversational coherence and instruction following across multiple domains including mathematics, programming, and logical reasoning.
Technically, the model is built on a dense Transformer-based architecture featuring Multi-Head Attention and RoPE (Rotary Positional Embeddings) for efficient sequence handling. A significant advancement in the ChatGLM3 iteration is its native support for complex agent-centric workflows, including function calling and code execution via an integrated interpreter. This functionality is supported by a redesigned prompt format that facilitates structured interactions and multi-turn dialogue management, making it suitable for deployment in scenarios requiring autonomous task execution.
Designed for local and edge deployment, ChatGLM3-6B maintains a low computational footprint while delivering enhanced performance relative to its predecessors. It utilizes SwiGLU activation functions and RMSNorm for stable training, with a vocabulary expanded to support efficient bilingual tokenization. The model's versatility is demonstrated through its ability to handle a variety of downstream applications, from standard question-answering to sophisticated agentic behaviors, all while operating within a context window optimized for standard conversational tasks.
ChatGLM series models from Z.ai, based on GLM architecture.
排名
#102
| 基准 | 分数 | 排名 |
|---|---|---|
Web Development WebDev Arena | 1056 | 63 |