GLM-4.7: Specifications and GPU VRAM Requirements

GLM-4.7

Open Source

Open Weights

Active Parameters

358B

Context Length

200K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

MIT

Release Date

8 Jan 2026

Knowledge Cutoff

Sep 2024

Technical Specifications

Total Expert Parameters

Number of Experts

Active Experts

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

GLM-4.7

GLM-4.7 is a substantial bilingual Mixture of Experts (MoE) model engineered by Z.ai, designed for advanced agentic coding and complex reasoning tasks. It represents an iteration in the GLM-4 series, building upon its predecessors to enhance capabilities in multi-language programming and terminal-based workflows. The model incorporates a sophisticated three-tier thinking architecture: Interleaved Thinking, which involves reasoning prior to each response and tool invocation to refine instruction adherence and generation quality; Preserved Thinking, which maintains reasoning patterns across multi-turn conversations to support long-horizon tasks by minimizing information decay; and Turn-level Thinking, providing granular control over reasoning depth per interaction to balance latency and computational cost.

This architecture is tailored to facilitate superior performance in agent-based applications, enabling more stable and controllable execution of complex operations. The model is equipped to handle diverse programming challenges, including those requiring agentic workflows across multiple files and turns. It aims to generate more natural conversational outputs and enhance the aesthetic quality of front-end and user interface code, delivering cleaner, more modern web pages and improved presentation layouts.

GLM-4.7 also demonstrates advancements in tool integration, allowing for robust interaction with external toolsets. Its capabilities extend to intricate reasoning, including mathematical problem-solving and general analytical tasks. The model's design emphasizes adaptability and efficiency for a spectrum of development and automation scenarios.

About GLM-4

GLM-4 is a series of bilingual (English and Chinese) language models developed by Zhipu AI. The models feature extended context windows, superior coding performance, advanced reasoning capabilities, and strong agent functionalities. GLM-4.6 offers improvements in tool use and search-based agents.

Other GLM-4 Models

GLM-4.6

Evaluation Benchmarks

Rank

#36

Benchmark	Score	Rank
Data Analysis LiveBench Data Analysis	0.74	⭐ 4
Professional Knowledge MMLU Pro	0.84	13
Agentic Coding LiveBench Agentic	0.42	16
Coding LiveBench Coding	0.73	23
Mathematics LiveBench Mathematics	0.76	23
Reasoning LiveBench Reasoning	0.60	29
Graduate-Level QA GPQA	0.86	35

Rankings

Overall Rank

#36

Coding Rank

#30

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

98k

195k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code