Active Parameters
358B
Context Length
200K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
MIT
Release Date
8 Jan 2026
Knowledge Cutoff
Sep 2024
Attention
Attention Structure
Multi-Head Attention
Attention Heads
96
Key-Value Heads
8
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
1,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
Swish
Dimensions
Hidden Dimension Size
5,120
Number of Layers
92
FFN Intermediate Size (Dense)
1,536
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
151,552
Mixture of Experts
Total Expert Parameters
32.0B
Number of Experts
160
Active Experts
8
Shared Experts
1
FFN Intermediate Size (per Expert)
1,536
Dense Layers Before MoE
3
GLM-4.7 is a large-scale Mixture of Experts (MoE) model developed by Z.ai, specifically architected to support advanced agentic coding, complex reasoning, and multi-step tool orchestration. Building upon the GLM-4 series, the model integrates a sophisticated reasoning system that prioritizes logical consistency and task completion across extended interactions. It is designed to function as a primary engine for coding agents and terminal-based automation, featuring optimizations for multi-language programming and autonomous execution within complex software environments.
The model's technical foundation includes a triple-tier thinking architecture designed to maintain reasoning coherence. Interleaved Thinking allows the model to perform internal reasoning steps before every response and tool invocation, ensuring that generated instructions align with logical constraints. Preserved Thinking facilitates the retention of these reasoning blocks across multi-turn conversations, preventing the context decay typically seen in long-horizon tasks. Additionally, Turn-level Thinking provides a granular control mechanism, allowing developers to adjust reasoning depth based on the specific requirements of each interaction to manage computational overhead and latency effectively.
Beyond programming, GLM-4.7 features a refined approach to frontend and user interface development, often referred to as vibe coding. This capability focuses on generating aesthetically consistent and structurally sound UI code, including modern web pages and professional presentation layouts. The model's architecture also emphasizes robust tool integration, enabling it to navigate terminal environments, execute shell commands, and interact with external APIs while maintaining a high degree of stability and instruction adherence in diverse automation scenarios.
GLM-4 is a series of bilingual (English and Chinese) language models developed by Zhipu AI. The models feature extended context windows, superior coding performance, advanced reasoning capabilities, and strong agent functionalities. GLM-4.6 offers improvements in tool use and search-based agents.
Rank
#36
| Benchmark | Score | Rank |
|---|---|---|
Graduate-Level QA GPQA | 0.857 | 8 |
Web Development WebDev Arena | 1440 | ⭐ 11 |
Coding LiveBench Coding | 0.73 | 27 |
Data Analysis LiveBench Data Analysis | 0.55 | 28 |
Professional Knowledge MMLU Pro | 0.83 | 28 |
Agentic Coding LiveBench Agentic | 0.42 | 29 |
Mathematics LiveBench Mathematics | 0.76 | 29 |
Reasoning LiveBench Reasoning | 0.60 | 36 |
Overall Rank
#36
Coding Rank
#26
Total Score
54
/ 100
GLM-4.7 demonstrates a commitment to the open-weights ecosystem through permissive licensing and accessible model weights. However, the model suffers from significant transparency gaps regarding its training data composition and the specific compute resources utilized. While its performance is well-documented, the inability to fully replicate agent-based benchmarks due to proprietary evaluation frameworks remains a critical weakness.
Architectural Provenance
The model is explicitly identified as a Mixture of Experts (MoE) architecture with 358B total parameters, building upon the GLM-4 series. Documentation describes high-level features like 'Interleaved Thinking' and 'Preserved Thinking' for agentic workflows. However, while it is described as a 'General Language Model' (GLM) architecture, specific technical details regarding the internal layer configurations, routing mechanisms, or the exact pre-training methodology are not provided in a formal technical paper, relying instead on blog posts and model cards.
Dataset Composition
There is no public disclosure of the specific training data sources, dataset proportions, or filtering methodologies. Official documentation mentions 'large-scale' training but lacks any breakdown (e.g., code vs. web vs. books). The data collection process remains opaque, and no sample data or detailed composition statistics are available to the public, falling into the category of 'proprietary dataset' claims.
Tokenizer Integrity
The tokenizer is publicly available via the Hugging Face repository and is compatible with standard libraries like 'transformers'. Vocabulary size and tokenization behavior are verifiable through the provided code snippets and API documentation. It supports multilingual inputs (English and Chinese) with documented token-to-word ratios (~0.75 for English, 1.5 for Chinese), though detailed training alignment for the tokenizer itself is not fully explored in the documentation.
Parameter Density
The total parameter count is clearly stated as 358B. However, for the flagship GLM-4.7 variant, the number of active parameters per token is not explicitly disclosed in the primary documentation, unlike the 'Flash' variant which specifies 3B active out of 30B. This lack of clarity on active parameters for the main model makes it difficult to assess true computational density and efficiency.
Training Compute
Information regarding training compute is extremely limited. While community discussions (AMA) mention the use of resources equivalent to roughly 2,000 H100/H800 GPUs for post-training experiments, there is no official disclosure of total GPU hours, hardware specifications for the full pre-training, carbon footprint, or estimated training costs in any formal documentation.
Benchmark Reproducibility
While the model provides scores for numerous public benchmarks (SWE-bench, MMLU-Pro, HLE), the evaluation code and exact prompts used for these specific results are not fully public. Users on Hugging Face have raised questions about replicating 'search-agent' and 'context management' benchmarks (BrowseComp/HLE), noting that the internal frameworks used for these evaluations have not been open-sourced, hindering independent verification.
Identity Consistency
The model consistently identifies as part of the GLM-4 series and maintains clear versioning (GLM-4.7 vs 4.6). It is transparent about its specific focus on 'agentic coding' and 'thinking' modes. There are no documented cases of the model claiming to be a competitor's product or misrepresenting its fundamental nature as an AI developed by Z.ai.
License Clarity
The model weights are released under the MIT license, which is a clear and permissive open-source license. However, there is some ambiguity regarding the 'governing terms' mentioned in third-party distributions (like NVIDIA NIM) and whether the training data or specific agentic frameworks used alongside the model are subject to the same permissive terms.
Hardware Footprint
VRAM requirements are provided for the 'Flash' variant (16GB-24GB) and quantization (FP8) is supported. For the full 358B model, documentation suggests the use of vLLM and SGLang with multi-GPU setups (TP size 4 or 8), but detailed memory scaling for the 200K context window and specific quantization-accuracy trade-offs for the flagship version are less thoroughly documented than for the smaller variants.
Versioning Drift
Z.ai uses a clear versioning scheme (4.5, 4.6, 4.7) and maintains a basic changelog in blog posts. However, the 'silent' nature of updates to the underlying API and the lack of a detailed, granular version history for weight checkpoints make it difficult to track subtle behavior drift or performance changes over time.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online