ApX logoApX logo

GLM-4

Parameters

32B

Context Length

128K

Modality

Text

Architecture

Dense

License

Custom Commercial License with Restrictions

Release Date

15 Jan 2024

Knowledge Cutoff

Dec 2023

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

6144

Number of Layers

61

Attention Heads

48

Key-Value Heads

2

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

GLM-4

The GLM-4 32B model is a foundational large language model developed by Z.ai, representing a significant scaling of the General Language Model (GLM) architecture to 32 billion parameters. This model is engineered to balance high-order reasoning capabilities with computational efficiency, serving as a versatile core for advanced agentic applications, complex code generation, and intricate bilingual text processing. It occupies a strategic position within the GLM-4 family, providing the structural complexity necessary for sophisticated linguistic understanding while maintaining a footprint suitable for diverse deployment environments.

Technically, the model utilizes a dense transformer architecture optimized through extensive pre-training on a massive corpus of 15 trillion tokens. This training set includes a substantial proportion of synthetic reasoning data, specifically curated to enhance the model's logical inference and problem-solving skills. The architectural design integrates modern advancements such as Rotary Positional Embeddings (RoPE) and Group Query Attention (GQA), which together facilitate stable performance and efficient inference over a context window of up to 128,000 tokens. To ensure high-quality output, the model undergoes a multi-stage post-training pipeline involving human preference alignment, rejection sampling, and reinforcement learning.

GLM-4 32B is specifically optimized for scenarios requiring structured outputs and autonomous tool interaction. Its performance characteristics make it particularly effective for engineering-grade code generation, precise search-based question answering, and the creation of detailed technical artifacts. The model's refined instruction-following and robust function-calling capabilities enable it to act as the primary engine for intelligent agents that need to plan and execute multi-step tasks across diverse software environments and knowledge domains.

About GLM Family

General Language Models from Z.ai


Other GLM Family Models

Evaluation Benchmarks

No evaluation benchmarks for GLM-4 available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
63k
125k

VRAM Required:

Recommended GPUs

GLM-4: Specifications and GPU VRAM Requirements