ApX logoApX logo

ChatGLM3-6B

Parameters

6B

Context Length

8K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

27 Oct 2023

Knowledge Cutoff

Jul 2023

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

32

Key-Value Heads

2

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

-

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

28

FFN Intermediate Size (Dense)

13,696

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

65,024

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 4.1k · Context: 8K · Vocab: 65kx 28 layersRMSNormPre-AttentionMulti-Head Attention32Q / 2KV headsHead dim: 128+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 13.7k+Final RMSNormOutput Logits

ChatGLM3-6B

ChatGLM3-6B is an advanced bilingual (Chinese-English) large language model developed through a collaboration between Zhipu AI and the Knowledge Engineering Group at Tsinghua University. As the third generation in the ChatGLM series, this model implements a refined General Language Model architecture that bridges the functional divide between autoencoding and autoregressive objectives. The pre-training phase utilizes a diverse corpus comprising approximately one trillion tokens, optimized for conversational coherence and instruction following across multiple domains including mathematics, programming, and logical reasoning.

Technically, the model is built on a dense Transformer-based architecture featuring Multi-Head Attention and RoPE (Rotary Positional Embeddings) for efficient sequence handling. A significant advancement in the ChatGLM3 iteration is its native support for complex agent-centric workflows, including function calling and code execution via an integrated interpreter. This functionality is supported by a redesigned prompt format that facilitates structured interactions and multi-turn dialogue management, making it suitable for deployment in scenarios requiring autonomous task execution.

Designed for local and edge deployment, ChatGLM3-6B maintains a low computational footprint while delivering enhanced performance relative to its predecessors. It utilizes SwiGLU activation functions and RMSNorm for stable training, with a vocabulary expanded to support efficient bilingual tokenization. The model's versatility is demonstrated through its ability to handle a variety of downstream applications, from standard question-answering to sophisticated agentic behaviors, all while operating within a context window optimized for standard conversational tasks.

About ChatGLM

ChatGLM series models from Z.ai, based on GLM architecture.


Other ChatGLM Models

Evaluation Benchmarks

Rank

#161

BenchmarkScoreRank

Web Development

WebDev Arena

1056

108

General Text

Text Arena

1055

110

Rankings

Overall Rank

#161

Coding Rank

#134

Model Integrity

Total Score

B

64 / 100

ChatGLM3-6B Model Integrity Report

Total Score

64

/ 100

B

Audit Note

ChatGLM3-6B demonstrates strong transparency in its architectural specifications and hardware requirements, providing clear deployment paths for users. However, it remains significantly opaque regarding its training data composition and compute resources, which hinders independent auditing of its data provenance and environmental impact. The custom licensing for weights and the lack of a comprehensive benchmark reproduction suite further limit its overall transparency profile.

Upstream

20.0 / 30

Architectural Provenance

7.5 / 10

ChatGLM3-6B is explicitly documented as a dense Transformer-based model utilizing Multi-Query Attention (MQA), SwiGLU activation functions, and Rotary Positional Embeddings (RoPE). The architecture is a refinement of the General Language Model (GLM) framework, which is well-documented in the 2024 technical report 'ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools'. While the high-level architecture is clear, the specific layer-by-layer configuration and pre-training hyperparameter details are less granular than top-tier open-source models.

Dataset Composition

4.0 / 10

The model is stated to be trained on approximately 1 trillion tokens of a diverse bilingual (Chinese-English) corpus. However, the exact composition breakdown (e.g., specific percentages of web, code, and books) is not publicly disclosed. Documentation vaguely refers to 'more diverse training datasets' and 'high-quality and educational sources' without providing a verifiable data provenance map or sample data for inspection.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via the official GitHub repository and Hugging Face. It uses a unified vocabulary of 150,000 tokens based on the SentencePiece implementation, specifically optimized for Chinese and English. The vocabulary size and tokenization approach (byte-level BPE) are clearly stated and verifiable through the provided 'tokenization_chatglm.py' and 'tokenizer.model' files.

Model

24.5 / 40

Parameter Density

9.0 / 10

The model's parameter count is consistently stated as 6.2 billion (often rounded to 6B). As a dense architecture, all parameters are active during inference. The architectural breakdown is verifiable through the 'config.json' and 'modeling_chatglm.py' files in the official repository, which specify 28 layers and a hidden size of 4096.

Training Compute

2.0 / 10

There is virtually no public information regarding the specific training compute resources used for ChatGLM3-6B. While the technical report mentions 'sufficient training steps' and 'optimized training strategies,' it fails to disclose GPU/TPU hours, hardware specifications, training duration, or the associated carbon footprint. This lack of transparency makes environmental and cost impact assessments impossible.

Benchmark Reproducibility

5.0 / 10

The model provides scores for standard benchmarks like MMLU, GSM8K, and MATH. While the GitHub repository includes some evaluation scripts and mentions few-shot/zero-shot settings, it lacks a comprehensive, one-click reproduction suite with exact prompt templates for all 40+ claimed benchmarks. Third-party verification is available through leaderboards like OpenCompass, but the internal evaluation methodology remains partially opaque.

Identity Consistency

8.5 / 10

ChatGLM3-6B generally identifies itself correctly as an AI assistant developed by Zhipu AI and Tsinghua University. It maintains a consistent versioning identity in its system prompts. However, some users have reported minor identity confusion in specific edge cases where it might default to generic 'AI assistant' responses without branding, though it does not falsely claim to be a competitor's model.

Downstream

19.5 / 30

License Clarity

7.0 / 10

The model weights are released under a custom 'ChatGLM3-6B License' which allows for free academic research and free commercial use after a mandatory registration process. While the terms are relatively clear, the requirement for a questionnaire-based registration for commercial use adds a layer of friction and potential restriction that deviates from standard open-source licenses like Apache 2.0 (which is used for the code but not the weights).

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented. The official documentation provides specific VRAM estimates for FP16 (approx. 13GB) and INT4 quantization (approx. 5GB). It also details support for various acceleration backends including TensorRT-LLM, OpenVINO, and MPS for Mac deployment, providing users with clear guidance on deployment feasibility.

Versioning Drift

4.5 / 10

The model uses basic versioning (e.g., v1.0.0), but a detailed, public changelog tracking behavioral drift or specific weight updates over time is lacking. While the GitHub repository shows commit history, it does not provide a high-level semantic versioning map that explains the impact of updates on model performance or safety alignment.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
4k
8k

VRAM Required:

Recommended GPUs

ChatGLM3-6B: Specifications and GPU VRAM Requirements