Parameters
6B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
27 Oct 2023
Knowledge Cutoff
Jul 2023
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
28
Attention Heads
32
Key-Value Heads
2
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
ChatGLM3-6B is an advanced bilingual (Chinese-English) large language model developed through a collaboration between Zhipu AI and the Knowledge Engineering Group at Tsinghua University. As the third generation in the ChatGLM series, this model implements a refined General Language Model architecture that bridges the functional divide between autoencoding and autoregressive objectives. The pre-training phase utilizes a diverse corpus comprising approximately one trillion tokens, optimized for conversational coherence and instruction following across multiple domains including mathematics, programming, and logical reasoning.
Technically, the model is built on a dense Transformer-based architecture featuring Multi-Head Attention and RoPE (Rotary Positional Embeddings) for efficient sequence handling. A significant advancement in the ChatGLM3 iteration is its native support for complex agent-centric workflows, including function calling and code execution via an integrated interpreter. This functionality is supported by a redesigned prompt format that facilitates structured interactions and multi-turn dialogue management, making it suitable for deployment in scenarios requiring autonomous task execution.
Designed for local and edge deployment, ChatGLM3-6B maintains a low computational footprint while delivering enhanced performance relative to its predecessors. It utilizes SwiGLU activation functions and RMSNorm for stable training, with a vocabulary expanded to support efficient bilingual tokenization. The model's versatility is demonstrated through its ability to handle a variety of downstream applications, from standard question-answering to sophisticated agentic behaviors, all while operating within a context window optimized for standard conversational tasks.
ChatGLM series models from Z.ai, based on GLM architecture.
Rank
#102
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1056 | 63 |
Overall Rank
#102
Coding Rank
#93
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens