ApX logoApX logo

ChatGLM-6B

Parameters

6B

Context Length

2.048K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

14 Mar 2023

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

32

Key-Value Heads

32

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

-

Sliding Window Attention

No

Sliding Window Size

-

Normalization

Layer Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

28

FFN Intermediate Size (Dense)

16,384

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

130,528

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 4.1k · Context: 2k · Vocab: 130.5kx 28 layersLayerNormPre-AttentionMulti-Head Attention32Q / 32KV headsHead dim: 128+LayerNormPre-FFNFeed-Forward NetworkGELUIntermediate: 16.4k+Final LayerNormOutput Logits

ChatGLM-6B

ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University's KEG Lab and Zhipu AI. It is built upon the General Language Model (GLM) architecture. The model's primary objective is to facilitate conversational AI tasks, with a specific optimization for Chinese question answering and dialogue. A key design consideration for ChatGLM-6B was its accessibility for local deployment on consumer-grade hardware, enabling operation with as little as 6GB of GPU memory when utilizing INT4 quantization.

The model employs a Transformer-based architecture, deriving its foundational design from the GLM framework. During its pre-training phase, ChatGLM-6B incorporated a hybrid objective function. The training regimen involved a substantial corpus of approximately 1 trillion tokens, comprising both Chinese and English languages. Furthermore, the development process integrated advanced techniques such as supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback to align the model's outputs with human preferences. The underlying GLM architecture supports a 2D positional encoding scheme.

Despite its relatively compact size of 6.2 billion parameters, ChatGLM-6B demonstrates capabilities in generating coherent and contextually relevant responses. Its architecture emphasizes computational efficiency, allowing for deployment and inference on common GPU configurations, which broadens its applicability for researchers and developers. The model is suitable for a range of natural language processing tasks, including but not limited to machine translation, general question answering systems, and the construction of interactive chatbot applications, particularly in bilingual contexts involving Chinese and English.

About ChatGLM

ChatGLM series models from Z.ai, based on GLM architecture.


Other ChatGLM Models

Evaluation Benchmarks

Rank

#157

BenchmarkScoreRank

Web Development

WebDev Arena

995

92

Rankings

Overall Rank

#157

Coding Rank

#127

Model Integrity

Total Score

B

64 / 100

ChatGLM-6B Model Integrity Report

Total Score

64

/ 100

B

Audit Note

ChatGLM-6B exhibits strong transparency in its architectural foundations and hardware requirements, providing clear guidance for local deployment on consumer devices. However, it suffers from significant opacity regarding its training data composition and the specific compute resources utilized during development. While the model's identity and code licensing are clear, the restrictive weight license and lack of detailed dataset breakdowns limit its overall transparency profile.

Upstream

20.0 / 30

Architectural Provenance

7.5 / 10

ChatGLM-6B is explicitly built on the General Language Model (GLM) framework, which is well-documented in peer-reviewed research (Du et al., 2022). The architecture is a dense Transformer that uniquely combines autoencoding and autoregressive objectives. While the base model and its 2D positional encoding scheme are clearly defined, specific internal layer configurations and hyperparameters for the 6B variant are primarily found in the model's configuration files rather than a dedicated technical report for this specific version.

Dataset Composition

4.0 / 10

The model was trained on approximately 1 trillion tokens of a bilingual (Chinese and English) corpus. While the general categories of data are mentioned (webpages, Wikipedia, books, code, and research papers), there is no specific percentage breakdown or disclosure of the exact datasets used. The filtering and cleaning methodology is described at a high level (deduplication, quality filtering), but the lack of source-specific proportions or access to sample data limits transparency.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly available via the 'icetk' library and the official GitHub repository. It uses a byte-level Byte-Pair Encoding (BPE) algorithm with a clearly stated vocabulary size of 130,528 (often cited as ~150k in later iterations, but 130k for the original 6B). The implementation is open-source, allowing for full inspection of tokenization logic and vocabulary alignment with the claimed bilingual support.

Model

24.0 / 40

Parameter Density

7.0 / 10

The model is consistently identified as having 6.2 billion parameters. As a dense model, all parameters are active during inference. The architectural breakdown is verifiable through the provided source code (e.g., 28 layers, hidden size of 4096). However, detailed documentation on the specific parameter distribution between attention and feed-forward networks is not explicitly summarized in a model card, requiring manual code inspection.

Training Compute

3.0 / 10

Information regarding the training compute is minimal. While some third-party sources mention a cluster of 1,000 GPUs, official documentation does not disclose the specific GPU/TPU hours, hardware specifications (e.g., A100 vs. V100), or the total training duration. Environmental impact data and carbon footprint calculations are entirely absent.

Benchmark Reproducibility

5.0 / 10

The model provides results on standard benchmarks like MMLU and C-Eval. While the repository includes some evaluation scripts and the model is integrated into the 'InstructEval' framework, exact prompts and few-shot examples used for the original 6B release are not comprehensively documented in a centralized location. Third-party verification is possible but often shows variance due to the lack of standardized evaluation parameters in the initial release.

Identity Consistency

9.0 / 10

ChatGLM-6B demonstrates high identity consistency, correctly identifying itself as an AI assistant developed by Tsinghua University and Zhipu AI in its default system prompts. It maintains a clear versioning distinction from its successors (ChatGLM2/3) and does not exhibit significant identity confusion with competitor models in standard deployments.

Downstream

20.0 / 30

License Clarity

6.5 / 10

The code is released under the Apache 2.0 license, which is highly transparent. However, the model weights are governed by a separate 'Model License' that requires users to fill out a questionnaire for commercial use. This dual-licensing approach creates some ambiguity for commercial developers, as the 'open-source' claim applies to the code but not fully to the weights without additional registration.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. The developers explicitly state VRAM needs for various quantization levels (e.g., 13GB for FP16, 6GB for INT4). They provide clear guidance on local deployment on consumer-grade hardware and document the performance-efficiency trade-offs associated with quantization, making it one of the most transparent models in this category.

Versioning Drift

5.0 / 10

The project uses a basic versioning system (e.g., v1.1.0) and maintains a Change Log on GitHub. However, the transition between the original ChatGLM-6B and subsequent versions was marked by significant architectural shifts that were not always clearly mapped for backward compatibility. Silent updates to weights on Hugging Face have been noted by the community, and a formal semantic versioning policy is not strictly followed.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
1k
2k

VRAM Required:

Recommended GPUs

ChatGLM-6B: Specifications and GPU VRAM Requirements