ApX 标志ApX 标志

趋近智

ChatGLM2-6B

参数

6B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Custom License (ChatGLM2-6B License)

发布日期

25 Jun 2023

训练数据截止日期

-

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

28

注意力头

32

键值头

2

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

ChatGLM2-6B

ChatGLM2-6B is a bilingual large language model designed to facilitate conversational interactions in both Chinese and English. As the second iteration in the ChatGLM series developed by THUDM, it is built upon the General Language Model (GLM) framework and serves as a versatile tool for dialogue generation and cross-lingual text processing. The model is optimized for execution on consumer-grade hardware through efficient architectural choices, enabling a high degree of accessibility for developers and researchers working within hardware-constrained environments.

The architecture utilizes a dense transformer structure that incorporates several technical advancements over its predecessor. A key innovation is the adoption of Multi-Query Attention (MQA), which streamlines inference by sharing key and value heads across multiple query heads, significantly reducing the memory footprint of the KV cache. Furthermore, the model integrates Rotary Position Embeddings (RoPE) to capture token relationships and utilizes RMSNorm for improved training stability. The inclusion of FlashAttention during the pre-training phase allows the architecture to support a substantial context window, facilitating the processing of extended dialogue histories.

Operating with 6 billion parameters, ChatGLM2-6B provides a balanced profile of performance and efficiency. It was pre-trained on a diverse dataset comprising 1.4 trillion tokens and refined through human preference alignment to enhance its conversational quality. The model is particularly suited for applications such as intelligent virtual assistants and localized chatbots, where low-latency inference and bilingual proficiency are primary requirements. Its open-weights nature and support for INT4 quantization further expand its utility for local deployment and integration into specialized NLP pipelines.

关于 ChatGLM

ChatGLM series models from Z.ai, based on GLM architecture.


其他 ChatGLM 模型

评估基准

排名

#103

基准分数排名

Web Development

WebDev Arena

1024

64

排名

排名

#103

编程排名

#95

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
16k
32k

所需显存:

推荐 GPU