ApX logo

GLM-4.6

Active Parameters

357B

Context Length

200K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

30 Sept 2025

Knowledge Cutoff

-

Technical Specifications

Total Expert Parameters

32.0B

Number of Experts

-

Active Experts

-

Attention Structure

Multi-Head Attention

Hidden Dimension Size

5120

Number of Layers

-

Attention Heads

96

Key-Value Heads

-

Activation Function

-

Normalization

-

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

GLM-4.6

GLM-4.6 is a large language model developed by Z.ai, designed to facilitate advanced applications in artificial intelligence. This model is engineered to operate efficiently across a spectrum of complex tasks, including sophisticated coding, extended context processing, and agentic operations. Its bilingual capabilities, supporting both English and Chinese, extend its applicability across diverse linguistic contexts. The model’s purpose is to serve as a robust foundation for building intelligent systems capable of nuanced reasoning and autonomous interaction.

Architecturally, GLM-4.6 implements a Mixture-of-Experts (MoE) configuration, incorporating 357 billion total parameters, with 32 billion parameters actively utilized during a given forward pass. The model's design features a context window expanded to 200,000 tokens, enabling it to process and maintain coherence over substantial input sequences. Innovations within its attention mechanism include Grouped-Query Attention (GQA) with 96 attention heads, and the integration of a partial Rotary Position Embedding (RoPE) for positional encoding. Normalization is managed through QK-Norm, contributing to stabilized attention logits. These architectural choices aim to balance computational efficiency with enhanced performance in complex cognitive operations.

The operational characteristics of GLM-4.6 are optimized for real-world development workflows. It demonstrates superior coding performance, leading to more visually polished front-end generation and improved real-world application results. The model exhibits enhanced reasoning capabilities, which are further augmented by its integrated tool-use functionality during inference. This facilitates the creation of more capable agents proficient in search-based tasks and role-playing scenarios. Furthermore, GLM-4.6 achieves improved token efficiency, completing tasks with approximately 15% fewer tokens compared to its predecessor, GLM-4.5, thereby offering a more cost-effective inference profile.

About GLM-4

GLM-4 is a series of bilingual (English and Chinese) language models developed by Zhipu AI. The models feature extended context windows, superior coding performance, advanced reasoning capabilities, and strong agent functionalities. GLM-4.6 offers improvements in tool use and search-based agents.


Other GLM-4 Models
  • No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for GLM-4.6 available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
98k
195k

VRAM Required:

Recommended GPUs