Parameters
6B
Context Length
8K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
27 Oct 2023
Knowledge Cutoff
Jul 2023
Attention
Attention Structure
Multi-Head Attention
Attention Heads
32
Key-Value Heads
2
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
-
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
28
FFN Intermediate Size (Dense)
13,696
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
65,024
ChatGLM3-6B is an advanced bilingual (Chinese-English) large language model developed through a collaboration between Zhipu AI and the Knowledge Engineering Group at Tsinghua University. As the third generation in the ChatGLM series, this model implements a refined General Language Model architecture that bridges the functional divide between autoencoding and autoregressive objectives. The pre-training phase utilizes a diverse corpus comprising approximately one trillion tokens, optimized for conversational coherence and instruction following across multiple domains including mathematics, programming, and logical reasoning.
Technically, the model is built on a dense Transformer-based architecture featuring Multi-Head Attention and RoPE (Rotary Positional Embeddings) for efficient sequence handling. A significant advancement in the ChatGLM3 iteration is its native support for complex agent-centric workflows, including function calling and code execution via an integrated interpreter. This functionality is supported by a redesigned prompt format that facilitates structured interactions and multi-turn dialogue management, making it suitable for deployment in scenarios requiring autonomous task execution.
Designed for local and edge deployment, ChatGLM3-6B maintains a low computational footprint while delivering enhanced performance relative to its predecessors. It utilizes SwiGLU activation functions and RMSNorm for stable training, with a vocabulary expanded to support efficient bilingual tokenization. The model's versatility is demonstrated through its ability to handle a variety of downstream applications, from standard question-answering to sophisticated agentic behaviors, all while operating within a context window optimized for standard conversational tasks.
ChatGLM series models from Z.ai, based on GLM architecture.
Rank
#161
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1056 | 108 |
General Text Text Arena | 1055 | 110 |
Overall Rank
#161
Coding Rank
#134
Total Score
64
/ 100
ChatGLM3-6B demonstrates strong transparency in its architectural specifications and hardware requirements, providing clear deployment paths for users. However, it remains significantly opaque regarding its training data composition and compute resources, which hinders independent auditing of its data provenance and environmental impact. The custom licensing for weights and the lack of a comprehensive benchmark reproduction suite further limit its overall transparency profile.
Architectural Provenance
ChatGLM3-6B is explicitly documented as a dense Transformer-based model utilizing Multi-Query Attention (MQA), SwiGLU activation functions, and Rotary Positional Embeddings (RoPE). The architecture is a refinement of the General Language Model (GLM) framework, which is well-documented in the 2024 technical report 'ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools'. While the high-level architecture is clear, the specific layer-by-layer configuration and pre-training hyperparameter details are less granular than top-tier open-source models.
Dataset Composition
The model is stated to be trained on approximately 1 trillion tokens of a diverse bilingual (Chinese-English) corpus. However, the exact composition breakdown (e.g., specific percentages of web, code, and books) is not publicly disclosed. Documentation vaguely refers to 'more diverse training datasets' and 'high-quality and educational sources' without providing a verifiable data provenance map or sample data for inspection.
Tokenizer Integrity
The tokenizer is publicly accessible via the official GitHub repository and Hugging Face. It uses a unified vocabulary of 150,000 tokens based on the SentencePiece implementation, specifically optimized for Chinese and English. The vocabulary size and tokenization approach (byte-level BPE) are clearly stated and verifiable through the provided 'tokenization_chatglm.py' and 'tokenizer.model' files.
Parameter Density
The model's parameter count is consistently stated as 6.2 billion (often rounded to 6B). As a dense architecture, all parameters are active during inference. The architectural breakdown is verifiable through the 'config.json' and 'modeling_chatglm.py' files in the official repository, which specify 28 layers and a hidden size of 4096.
Training Compute
There is virtually no public information regarding the specific training compute resources used for ChatGLM3-6B. While the technical report mentions 'sufficient training steps' and 'optimized training strategies,' it fails to disclose GPU/TPU hours, hardware specifications, training duration, or the associated carbon footprint. This lack of transparency makes environmental and cost impact assessments impossible.
Benchmark Reproducibility
The model provides scores for standard benchmarks like MMLU, GSM8K, and MATH. While the GitHub repository includes some evaluation scripts and mentions few-shot/zero-shot settings, it lacks a comprehensive, one-click reproduction suite with exact prompt templates for all 40+ claimed benchmarks. Third-party verification is available through leaderboards like OpenCompass, but the internal evaluation methodology remains partially opaque.
Identity Consistency
ChatGLM3-6B generally identifies itself correctly as an AI assistant developed by Zhipu AI and Tsinghua University. It maintains a consistent versioning identity in its system prompts. However, some users have reported minor identity confusion in specific edge cases where it might default to generic 'AI assistant' responses without branding, though it does not falsely claim to be a competitor's model.
License Clarity
The model weights are released under a custom 'ChatGLM3-6B License' which allows for free academic research and free commercial use after a mandatory registration process. While the terms are relatively clear, the requirement for a questionnaire-based registration for commercial use adds a layer of friction and potential restriction that deviates from standard open-source licenses like Apache 2.0 (which is used for the code but not the weights).
Hardware Footprint
Hardware requirements are well-documented. The official documentation provides specific VRAM estimates for FP16 (approx. 13GB) and INT4 quantization (approx. 5GB). It also details support for various acceleration backends including TensorRT-LLM, OpenVINO, and MPS for Mac deployment, providing users with clear guidance on deployment feasibility.
Versioning Drift
The model uses basic versioning (e.g., v1.0.0), but a detailed, public changelog tracking behavioral drift or specific weight updates over time is lacking. While the GitHub repository shows commit history, it does not provide a high-level semantic versioning map that explains the impact of updates on model performance or safety alignment.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online