ChatGLM-6B: Specifications and GPU VRAM Requirements

ChatGLM-6B

Open Source

Open Weights

Parameters

Context Length

2.048K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

14 Mar 2023

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

GELU

Normalization

Layer Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

ChatGLM-6B

ChatGLM-6B is an open-source, bilingual (Chinese and English) dialogue language model developed by Tsinghua University's KEG Lab and Zhipu AI. It is built upon the General Language Model (GLM) architecture. The model's primary objective is to facilitate conversational AI tasks, with a specific optimization for Chinese question answering and dialogue. A key design consideration for ChatGLM-6B was its accessibility for local deployment on consumer-grade hardware, enabling operation with as little as 6GB of GPU memory when utilizing INT4 quantization.

The model employs a Transformer-based architecture, deriving its foundational design from the GLM framework. During its pre-training phase, ChatGLM-6B incorporated a hybrid objective function. The training regimen involved a substantial corpus of approximately 1 trillion tokens, comprising both Chinese and English languages. Furthermore, the development process integrated advanced techniques such as supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback to align the model's outputs with human preferences. The underlying GLM architecture supports a 2D positional encoding scheme.

Despite its relatively compact size of 6.2 billion parameters, ChatGLM-6B demonstrates capabilities in generating coherent and contextually relevant responses. Its architecture emphasizes computational efficiency, allowing for deployment and inference on common GPU configurations, which broadens its applicability for researchers and developers. The model is suitable for a range of natural language processing tasks, including but not limited to machine translation, general question answering systems, and the construction of interactive chatbot applications, particularly in bilingual contexts involving Chinese and English.

About ChatGLM

ChatGLM series models from Z.ai, based on GLM architecture.

Other ChatGLM Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for ChatGLM-6B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code