ApX logoApX logo

ChatGLM2-6B

Parameters

6B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

Custom License (ChatGLM2-6B License)

Release Date

25 Jun 2023

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

28

Attention Heads

32

Key-Value Heads

2

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

ChatGLM2-6B

ChatGLM2-6B is a bilingual large language model designed to facilitate conversational interactions in both Chinese and English. As the second iteration in the ChatGLM series developed by THUDM, it is built upon the General Language Model (GLM) framework and serves as a versatile tool for dialogue generation and cross-lingual text processing. The model is optimized for execution on consumer-grade hardware through efficient architectural choices, enabling a high degree of accessibility for developers and researchers working within hardware-constrained environments.

The architecture utilizes a dense transformer structure that incorporates several technical advancements over its predecessor. A key innovation is the adoption of Multi-Query Attention (MQA), which streamlines inference by sharing key and value heads across multiple query heads, significantly reducing the memory footprint of the KV cache. Furthermore, the model integrates Rotary Position Embeddings (RoPE) to capture token relationships and utilizes RMSNorm for improved training stability. The inclusion of FlashAttention during the pre-training phase allows the architecture to support a substantial context window, facilitating the processing of extended dialogue histories.

Operating with 6 billion parameters, ChatGLM2-6B provides a balanced profile of performance and efficiency. It was pre-trained on a diverse dataset comprising 1.4 trillion tokens and refined through human preference alignment to enhance its conversational quality. The model is particularly suited for applications such as intelligent virtual assistants and localized chatbots, where low-latency inference and bilingual proficiency are primary requirements. Its open-weights nature and support for INT4 quantization further expand its utility for local deployment and integration into specialized NLP pipelines.

About ChatGLM

ChatGLM series models from Z.ai, based on GLM architecture.


Other ChatGLM Models

Evaluation Benchmarks

Rank

#103

BenchmarkScoreRank

Web Development

WebDev Arena

1024

64

Rankings

Overall Rank

#103

Coding Rank

#95

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs