Parameters
6B
Context Length
2.048K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
14 Mar 2023
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
32
Key-Value Heads
32
Attention Head Dimension
-
Position Embedding
Absolute Position Embedding
RoPE Theta
-
Sliding Window Attention
No
Sliding Window Size
-
Normalization
Layer Normalization
Activation Function
GELU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
28
FFN Intermediate Size (Dense)
16,384
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
130,528
ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University's KEG Lab and Zhipu AI. It is built upon the General Language Model (GLM) architecture. The model's primary objective is to facilitate conversational AI tasks, with a specific optimization for Chinese question answering and dialogue. A key design consideration for ChatGLM-6B was its accessibility for local deployment on consumer-grade hardware, enabling operation with as little as 6GB of GPU memory when utilizing INT4 quantization.
The model employs a Transformer-based architecture, deriving its foundational design from the GLM framework. During its pre-training phase, ChatGLM-6B incorporated a hybrid objective function. The training regimen involved a substantial corpus of approximately 1 trillion tokens, comprising both Chinese and English languages. Furthermore, the development process integrated advanced techniques such as supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback to align the model's outputs with human preferences. The underlying GLM architecture supports a 2D positional encoding scheme.
Despite its relatively compact size of 6.2 billion parameters, ChatGLM-6B demonstrates capabilities in generating coherent and contextually relevant responses. Its architecture emphasizes computational efficiency, allowing for deployment and inference on common GPU configurations, which broadens its applicability for researchers and developers. The model is suitable for a range of natural language processing tasks, including but not limited to machine translation, general question answering systems, and the construction of interactive chatbot applications, particularly in bilingual contexts involving Chinese and English.
ChatGLM series models from Z.ai, based on GLM architecture.
Rank
#157
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 995 | 92 |
Overall Rank
#157
Coding Rank
#127
Total Score
64
/ 100
ChatGLM-6B exhibits strong transparency in its architectural foundations and hardware requirements, providing clear guidance for local deployment on consumer devices. However, it suffers from significant opacity regarding its training data composition and the specific compute resources utilized during development. While the model's identity and code licensing are clear, the restrictive weight license and lack of detailed dataset breakdowns limit its overall transparency profile.
Architectural Provenance
ChatGLM-6B is explicitly built on the General Language Model (GLM) framework, which is well-documented in peer-reviewed research (Du et al., 2022). The architecture is a dense Transformer that uniquely combines autoencoding and autoregressive objectives. While the base model and its 2D positional encoding scheme are clearly defined, specific internal layer configurations and hyperparameters for the 6B variant are primarily found in the model's configuration files rather than a dedicated technical report for this specific version.
Dataset Composition
The model was trained on approximately 1 trillion tokens of a bilingual (Chinese and English) corpus. While the general categories of data are mentioned (webpages, Wikipedia, books, code, and research papers), there is no specific percentage breakdown or disclosure of the exact datasets used. The filtering and cleaning methodology is described at a high level (deduplication, quality filtering), but the lack of source-specific proportions or access to sample data limits transparency.
Tokenizer Integrity
The tokenizer is publicly available via the 'icetk' library and the official GitHub repository. It uses a byte-level Byte-Pair Encoding (BPE) algorithm with a clearly stated vocabulary size of 130,528 (often cited as ~150k in later iterations, but 130k for the original 6B). The implementation is open-source, allowing for full inspection of tokenization logic and vocabulary alignment with the claimed bilingual support.
Parameter Density
The model is consistently identified as having 6.2 billion parameters. As a dense model, all parameters are active during inference. The architectural breakdown is verifiable through the provided source code (e.g., 28 layers, hidden size of 4096). However, detailed documentation on the specific parameter distribution between attention and feed-forward networks is not explicitly summarized in a model card, requiring manual code inspection.
Training Compute
Information regarding the training compute is minimal. While some third-party sources mention a cluster of 1,000 GPUs, official documentation does not disclose the specific GPU/TPU hours, hardware specifications (e.g., A100 vs. V100), or the total training duration. Environmental impact data and carbon footprint calculations are entirely absent.
Benchmark Reproducibility
The model provides results on standard benchmarks like MMLU and C-Eval. While the repository includes some evaluation scripts and the model is integrated into the 'InstructEval' framework, exact prompts and few-shot examples used for the original 6B release are not comprehensively documented in a centralized location. Third-party verification is possible but often shows variance due to the lack of standardized evaluation parameters in the initial release.
Identity Consistency
ChatGLM-6B demonstrates high identity consistency, correctly identifying itself as an AI assistant developed by Tsinghua University and Zhipu AI in its default system prompts. It maintains a clear versioning distinction from its successors (ChatGLM2/3) and does not exhibit significant identity confusion with competitor models in standard deployments.
License Clarity
The code is released under the Apache 2.0 license, which is highly transparent. However, the model weights are governed by a separate 'Model License' that requires users to fill out a questionnaire for commercial use. This dual-licensing approach creates some ambiguity for commercial developers, as the 'open-source' claim applies to the code but not fully to the weights without additional registration.
Hardware Footprint
Hardware requirements are exceptionally well-documented. The developers explicitly state VRAM needs for various quantization levels (e.g., 13GB for FP16, 6GB for INT4). They provide clear guidance on local deployment on consumer-grade hardware and document the performance-efficiency trade-offs associated with quantization, making it one of the most transparent models in this category.
Versioning Drift
The project uses a basic versioning system (e.g., v1.1.0) and maintains a Change Log on GitHub. However, the transition between the original ChatGLM-6B and subsequent versions was marked by significant architectural shifts that were not always clearly mapped for backward compatibility. Silent updates to weights on Hugging Face have been noted by the community, and a formal semantic versioning policy is not strictly followed.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online