ApX logoApX logo

GLM-4.7

Active Parameters

358B

Context Length

200K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

8 Jan 2026

Knowledge Cutoff

Sep 2024

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

96

Key-Value Heads

8

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

1,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

5,120

Number of Layers

92

FFN Intermediate Size (Dense)

1,536

Multi-Token Prediction Heads

1

Tokenizer

Vocabulary Size

151,552

Mixture of Experts

Total Expert Parameters

32.0B

Number of Experts

160

Active Experts

8

Shared Experts

1

FFN Intermediate Size (per Expert)

1,536

Dense Layers Before MoE

3

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 5.1k · Context: 200k · Vocab: 151.6kx 92 layersRMSNormPre-AttentionMulti-Head Attention96Q / 8KV headsHead dim: 128+RMSNormPre-FFNSparse MoE FFN (8/160 experts)SwishIntermediate: 1.5k+Final RMSNormOutput Logits

GLM-4.7

GLM-4.7 is a large-scale Mixture of Experts (MoE) model developed by Z.ai, specifically architected to support advanced agentic coding, complex reasoning, and multi-step tool orchestration. Building upon the GLM-4 series, the model integrates a sophisticated reasoning system that prioritizes logical consistency and task completion across extended interactions. It is designed to function as a primary engine for coding agents and terminal-based automation, featuring optimizations for multi-language programming and autonomous execution within complex software environments.

The model's technical foundation includes a triple-tier thinking architecture designed to maintain reasoning coherence. Interleaved Thinking allows the model to perform internal reasoning steps before every response and tool invocation, ensuring that generated instructions align with logical constraints. Preserved Thinking facilitates the retention of these reasoning blocks across multi-turn conversations, preventing the context decay typically seen in long-horizon tasks. Additionally, Turn-level Thinking provides a granular control mechanism, allowing developers to adjust reasoning depth based on the specific requirements of each interaction to manage computational overhead and latency effectively.

Beyond programming, GLM-4.7 features a refined approach to frontend and user interface development, often referred to as vibe coding. This capability focuses on generating aesthetically consistent and structurally sound UI code, including modern web pages and professional presentation layouts. The model's architecture also emphasizes robust tool integration, enabling it to navigate terminal environments, execute shell commands, and interact with external APIs while maintaining a high degree of stability and instruction adherence in diverse automation scenarios.

About GLM-4

GLM-4 is a series of bilingual (English and Chinese) language models developed by Zhipu AI. The models feature extended context windows, superior coding performance, advanced reasoning capabilities, and strong agent functionalities. GLM-4.6 offers improvements in tool use and search-based agents.


Other GLM-4 Models

Evaluation Benchmarks

Rank

#36

BenchmarkScoreRank

Graduate-Level QA

GPQA

0.857

8

Web Development

WebDev Arena

1440

11

0.73

27

0.55

28

Professional Knowledge

MMLU Pro

0.83

28

Agentic Coding

LiveBench Agentic

0.42

29

0.76

29

0.60

36

Rankings

Overall Rank

#36

Coding Rank

#26

Model Integrity

Total Score

C+

54 / 100

GLM-4.7 Model Integrity Report

Total Score

54

/ 100

C+

Audit Note

GLM-4.7 demonstrates a commitment to the open-weights ecosystem through permissive licensing and accessible model weights. However, the model suffers from significant transparency gaps regarding its training data composition and the specific compute resources utilized. While its performance is well-documented, the inability to fully replicate agent-based benchmarks due to proprietary evaluation frameworks remains a critical weakness.

Upstream

16.0 / 30

Architectural Provenance

6.0 / 10

The model is explicitly identified as a Mixture of Experts (MoE) architecture with 358B total parameters, building upon the GLM-4 series. Documentation describes high-level features like 'Interleaved Thinking' and 'Preserved Thinking' for agentic workflows. However, while it is described as a 'General Language Model' (GLM) architecture, specific technical details regarding the internal layer configurations, routing mechanisms, or the exact pre-training methodology are not provided in a formal technical paper, relying instead on blog posts and model cards.

Dataset Composition

2.0 / 10

There is no public disclosure of the specific training data sources, dataset proportions, or filtering methodologies. Official documentation mentions 'large-scale' training but lacks any breakdown (e.g., code vs. web vs. books). The data collection process remains opaque, and no sample data or detailed composition statistics are available to the public, falling into the category of 'proprietary dataset' claims.

Tokenizer Integrity

8.0 / 10

The tokenizer is publicly available via the Hugging Face repository and is compatible with standard libraries like 'transformers'. Vocabulary size and tokenization behavior are verifiable through the provided code snippets and API documentation. It supports multilingual inputs (English and Chinese) with documented token-to-word ratios (~0.75 for English, 1.5 for Chinese), though detailed training alignment for the tokenizer itself is not fully explored in the documentation.

Model

20.0 / 40

Parameter Density

5.0 / 10

The total parameter count is clearly stated as 358B. However, for the flagship GLM-4.7 variant, the number of active parameters per token is not explicitly disclosed in the primary documentation, unlike the 'Flash' variant which specifies 3B active out of 30B. This lack of clarity on active parameters for the main model makes it difficult to assess true computational density and efficiency.

Training Compute

2.0 / 10

Information regarding training compute is extremely limited. While community discussions (AMA) mention the use of resources equivalent to roughly 2,000 H100/H800 GPUs for post-training experiments, there is no official disclosure of total GPU hours, hardware specifications for the full pre-training, carbon footprint, or estimated training costs in any formal documentation.

Benchmark Reproducibility

4.0 / 10

While the model provides scores for numerous public benchmarks (SWE-bench, MMLU-Pro, HLE), the evaluation code and exact prompts used for these specific results are not fully public. Users on Hugging Face have raised questions about replicating 'search-agent' and 'context management' benchmarks (BrowseComp/HLE), noting that the internal frameworks used for these evaluations have not been open-sourced, hindering independent verification.

Identity Consistency

9.0 / 10

The model consistently identifies as part of the GLM-4 series and maintains clear versioning (GLM-4.7 vs 4.6). It is transparent about its specific focus on 'agentic coding' and 'thinking' modes. There are no documented cases of the model claiming to be a competitor's product or misrepresenting its fundamental nature as an AI developed by Z.ai.

Downstream

18.0 / 30

License Clarity

7.0 / 10

The model weights are released under the MIT license, which is a clear and permissive open-source license. However, there is some ambiguity regarding the 'governing terms' mentioned in third-party distributions (like NVIDIA NIM) and whether the training data or specific agentic frameworks used alongside the model are subject to the same permissive terms.

Hardware Footprint

6.0 / 10

VRAM requirements are provided for the 'Flash' variant (16GB-24GB) and quantization (FP8) is supported. For the full 358B model, documentation suggests the use of vLLM and SGLang with multi-GPU setups (TP size 4 or 8), but detailed memory scaling for the 200K context window and specific quantization-accuracy trade-offs for the flagship version are less thoroughly documented than for the smaller variants.

Versioning Drift

5.0 / 10

Z.ai uses a clear versioning scheme (4.5, 4.6, 4.7) and maintains a basic changelog in blog posts. However, the 'silent' nature of updates to the underlying API and the lack of a detailed, granular version history for weight checkpoints make it difficult to track subtle behavior drift or performance changes over time.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
98k
195k

VRAM Required:

Recommended GPUs

GLM-4.7: Specifications and GPU VRAM Requirements