ApX logoApX logo

Gemma 4 31B

Parameters

30.7B

Context Length

256K

Modality

Multimodal

Architecture

Dense

License

Apache 2.0

Release Date

2 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

32

Key-Value Heads

16

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Yes

Sliding Window Size

1,024

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

21,504

Number of Layers

60

FFN Intermediate Size (Dense)

21,504

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

262,144

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 21.5k · Context: 256k · Vocab: 262.1kx 60 layersRMSNormPre-AttentionGrouped-Query Attention32Q / 16KV heads · SW: 1kHead dim: 256+RMSNormPre-FFNFeed-Forward NetworkGELUIntermediate: 21.5k+Final RMSNormOutput Logits

Gemma 4 31B

Gemma 4 31B is the flagship dense model with 30.7B parameters and 256K context window, delivering frontier intelligence for workstations and consumer GPUs. Supports text and image input with state-of-the-art performance on coding, reasoning, and multimodal understanding. Features configurable thinking mode and native function calling for advanced agentic workflows and IDE integration.

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.


Other Gemma 4 Models

Evaluation Benchmarks

Rank

#89

BenchmarkScoreRank

0.59

24

Agentic Coding

LiveBench Agentic

0.40

31

0.74

32

0.59

37

0.60

51

Rankings

Overall Rank

#89

Coding Rank

#114

Model Integrity

Total Score

B

67 / 100

Gemma 4 31B Model Integrity Report

Total Score

67

/ 100

B

Audit Note

Gemma 4 31B sets a high bar for licensing transparency with its move to Apache 2.0 and provides excellent clarity on its dense architectural structure and hardware requirements. However, it remains deeply opaque regarding its training data composition and compute resources, relying on vague marketing descriptions instead of verifiable technical disclosures. While benchmark performance is high and third-party verified, the lack of public evaluation code and data provenance limits its overall transparency profile.

Upstream

18.5 / 30

Architectural Provenance

7.0 / 10

Gemma 4 31B is explicitly documented as a dense Transformer-based model built upon the research foundations of Gemini 3. Technical documentation specifies a hybrid attention mechanism that interleaves local sliding window attention (1024 token window for this variant) with global attention layers. It incorporates Proportional RoPE (p-RoPE) and unified Keys and Values in global layers to manage its 256K context window. While the high-level architecture is well-described, the specific pre-training recipe and the exact interleaving pattern of attention layers are not fully disclosed in the public model card.

Dataset Composition

3.0 / 10

Google provides only high-level thematic descriptions of the training data, citing 'large-scale multimodal pre-training data' including web documents, code, images, and audio with a cutoff of January 2025. There is no specific percentage breakdown of dataset components (e.g., what portion is code vs. web text), no disclosure of specific data sources, and no detailed documentation of the filtering or cleaning pipeline beyond vague mentions of 'quality and safety' filters. This lack of granularity makes the dataset composition largely unverifiable.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via the 'transformers' library and official GitHub repositories. It uses a Byte Pair Encoding (BPE) approach with a vocabulary size of 256,144 tokens, supporting over 140 languages. Documentation clearly states the use of byte fallback and specific normalization (replacing spaces with a specific character). The vocabulary size and behavior are consistent across API and local implementations, though the exact training data for the tokenizer itself is not detailed.

Model

25.0 / 40

Parameter Density

9.0 / 10

The model's parameter density is clearly stated as 30.7B total parameters. As a dense model, all parameters are active during inference, which is explicitly clarified in contrast to the MoE (26B A4B) variant in the same family. Architectural details such as the number of layers (60) and the wider model dimension compared to previous generations are provided in technical overviews, offering a high degree of transparency regarding its physical structure.

Training Compute

2.0 / 10

There is almost no transparency regarding the compute resources used to train Gemma 4 31B. While it is known to be trained on Google's TPU infrastructure (v4/v5), the specific number of TPU/GPU hours, the total energy consumption, and the carbon footprint are not disclosed in the official model card or technical documentation. This information is withheld for competitive reasons, leaving the environmental and resource impact entirely opaque.

Benchmark Reproducibility

5.0 / 10

Google provides extensive benchmark results (AIME 2026, MMLU Pro, LiveCodeBench v6) and third-party verification is available through the LMSYS Chatbot Arena (where it ranks #3 among open models). However, the exact evaluation code, specific few-shot prompts, and detailed reproduction instructions are not fully public. While the results are verifiable by third parties, the lack of a standardized, public evaluation harness for the specific reported scores limits full reproducibility.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as a Google-developed model within the Gemma 4 family. It is transparent about its versioning (31B IT) and its multimodal capabilities (text and image input). There are no documented cases of the model claiming to be a competitor's product or misrepresenting its core architectural nature as a dense model.

Downstream

23.0 / 30

License Clarity

10.0 / 10

Gemma 4 31B is released under the standard Apache 2.0 license, a significant and transparent shift from the custom 'Gemma Terms of Use' seen in previous versions. This license is globally recognized, permits unrestricted commercial use, modification, and redistribution, and contains no hidden usage restrictions or 'trust me' clauses. The legal terms are clear, public, and standard across the industry.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented by both Google and third-party maintainers (e.g., Unsloth, LM Studio). VRAM requirements are specified for various precisions: ~62GB for BF16, ~34GB for Q8, and ~20GB for Q4. Documentation also includes memory scaling data for the 256K context window, noting that a 24GB GPU can handle up to ~45K tokens. The trade-offs between quantization levels and performance are widely discussed in the community and supported by official guidance.

Versioning Drift

5.0 / 10

The model uses clear naming conventions (Gemma 4 31B IT) and weights are hosted on Hugging Face with basic commit histories. However, Google does not provide a detailed public changelog or a formal policy on model drift and silent updates. While versioning exists at the release level, the long-term commitment to maintaining access to specific 'frozen' versions versus rolling updates is not explicitly documented.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs

Gemma 4 31B: Specifications and GPU VRAM Requirements