Active Parameters
357B
Context Length
200K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
MIT
Release Date
30 Sept 2025
Knowledge Cutoff
-
Total Expert Parameters
32.0B
Number of Experts
-
Active Experts
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
5120
Number of Layers
-
Attention Heads
96
Key-Value Heads
-
Activation Function
-
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
GLM-4.6 is a large language model developed by Z.ai, designed to facilitate advanced applications in artificial intelligence. This model is engineered to operate efficiently across a spectrum of complex tasks, including sophisticated coding, extended context processing, and agentic operations. Its bilingual capabilities, supporting both English and Chinese, extend its applicability across diverse linguistic contexts. The model’s purpose is to serve as a robust foundation for building intelligent systems capable of nuanced reasoning and autonomous interaction.
Architecturally, GLM-4.6 implements a Mixture-of-Experts (MoE) configuration, incorporating 357 billion total parameters, with 32 billion parameters actively utilized during a given forward pass. The model's design features a context window expanded to 200,000 tokens, enabling it to process and maintain coherence over substantial input sequences. Innovations within its attention mechanism include Grouped-Query Attention (GQA) with 96 attention heads, and the integration of a partial Rotary Position Embedding (RoPE) for positional encoding. Normalization is managed through QK-Norm, contributing to stabilized attention logits. These architectural choices aim to balance computational efficiency with enhanced performance in complex cognitive operations.
The operational characteristics of GLM-4.6 are optimized for real-world development workflows. It demonstrates superior coding performance, leading to more visually polished front-end generation and improved real-world application results. The model exhibits enhanced reasoning capabilities, which are further augmented by its integrated tool-use functionality during inference. This facilitates the creation of more capable agents proficient in search-based tasks and role-playing scenarios. Furthermore, GLM-4.6 achieves improved token efficiency, completing tasks with approximately 15% fewer tokens compared to its predecessor, GLM-4.5, thereby offering a more cost-effective inference profile.
GLM-4 is a series of bilingual (English and Chinese) language models developed by Zhipu AI. The models feature extended context windows, superior coding performance, advanced reasoning capabilities, and strong agent functionalities. GLM-4.6 offers improvements in tool use and search-based agents.
Ranking is for Local LLMs.
No evaluation benchmarks for GLM-4.6 available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens