ChatGLM3-6B-32K: Specifications and GPU VRAM Requirements

ChatGLM3-6B-32K

Open Source

Open Weights

Parameters

Context Length

32.768K

Modality

Text

Architecture

Dense

License

ChatGLM3-6B Model License

Release Date

27 Oct 2023

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

ChatGLM3-6B-32K

ChatGLM3-6B-32K is an advanced large language model developed jointly by Zhipu AI and Tsinghua University's KEG Lab. This variant builds upon the foundation of ChatGLM3-6B, specifically enhancing its capabilities for processing and understanding long textual contexts. The model is engineered to effectively manage input sequences up to 32,768 tokens in length, a significant extension compared to the 8,000-token context of its predecessor.

The architectural design of ChatGLM3-6B-32K is based on the transformer framework, a common paradigm in large language models. Key innovations in this variant include updated position encoding mechanisms and a specialized training methodology tailored for long-text scenarios. This targeted approach during the conversation stage, utilizing the full 32K context length, allows the model to maintain coherence and accuracy over extended dialogues and documents.

ChatGLM3-6B-32K is designed for applications requiring deep understanding and generation of human-like text across extensive content. It natively supports various complex functionalities such as tool invocation (Function Call), code execution (Code Interpreter), and Agent tasks. This versatility makes it suitable for diverse use cases including long-form conversations, comprehensive text analysis of articles or documents, and the generation of detailed content based on provided prompts.

About ChatGLM

ChatGLM series models from Z.ai, based on GLM architecture.

Other ChatGLM Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for ChatGLM3-6B-32K available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

16k

32k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights Source Code