ApX logoApX logo

Gemma 4 26B A4B

Active Parameters

25.2B

Context Length

256K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

2 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

16

Key-Value Heads

8

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

1,024

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

2,112

Number of Layers

30

FFN Intermediate Size (Dense)

704

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

262,144

Mixture of Experts

Total Expert Parameters

3.8B

Number of Experts

128

Active Experts

8

Shared Experts

-

FFN Intermediate Size (per Expert)

704

Dense Layers Before MoE

-

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 2.1k · Context: 256k · Vocab: 262.1kx 30 layersRMSNormPre-AttentionGrouped-Query Attention16Q / 8KV heads · SW: 1kHead dim: 256+RMSNormPre-FFNSparse MoE FFN (8/128 experts)GELUIntermediate: 704+Final RMSNormOutput Logits

Gemma 4 26B A4B

Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total parameters but only 3.8B active per inference, achieving the speed of a 4B model with near-31B performance. Features 128 experts (8 active) with 256K context window, supporting text and image input. Optimized for fast inference on consumer GPUs while delivering frontier-level reasoning and coding capabilities.

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.


Other Gemma 4 Models

Evaluation Benchmarks

Rank

#73

No evaluation benchmarks for Gemma 4 26B A4B available.

Rankings

Overall Rank

#73

Coding Rank

-

Model Integrity

Total Score

B

70 / 100

Gemma 4 26B A4B Model Integrity Report

Total Score

70

/ 100

B

Audit Note

Gemma 4 26B A4B exhibits strong transparency in its licensing and architectural specifications, particularly regarding its Mixture-of-Experts structure and hardware requirements. However, it suffers from significant opacity in training data provenance and compute resources, lacking a formal technical paper to verify its underlying methodology. The transition to a standard Apache 2.0 license is a commendable step toward industry-leading transparency for open-weight models.

Upstream

19.5 / 30

Architectural Provenance

7.5 / 10

Gemma 4 26B A4B is explicitly documented as a Mixture-of-Experts (MoE) model derived from the Gemini 3 research lineage. Technical documentation details a hybrid attention mechanism alternating between local sliding-window (1024 tokens) and global full-context layers. It utilizes 128 experts with a routing policy that activates 8 experts plus 1 shared expert per token. While the high-level architecture is well-described across official blog posts and model cards, a formal peer-reviewed technical paper with full ablation studies is currently absent, preventing a higher score.

Dataset Composition

3.5 / 10

Disclosure regarding training data is limited to vague marketing claims. Documentation states the model was trained on a 'diverse' dataset supporting over 140 languages and interleaved multimodal inputs (text and images). However, there is no public breakdown of data sources (e.g., percentages of web, code, or academic data), no detailed filtering/cleaning methodology, and no disclosure of the specific proportions of synthetic vs. organic data used. The lack of a technical report leaves these critical details unverifiable.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via the official GitHub repository and Hugging Face collections. It supports a large vocabulary consistent with the claimed 140+ language support. Technical specifications for tokenization of multimodal inputs (variable resolution image tokens) are documented, with supported visual token budgets (70, 140, 280, 560, 1120) clearly stated. Integration with standard libraries like Transformers and vLLM allows for independent verification of tokenization behavior.

Model

26.5 / 40

Parameter Density

9.0 / 10

Google provides exemplary transparency regarding parameter density for this variant. The model is clearly labeled '26B A4B', explicitly denoting 25.2B total parameters with 3.8B active parameters per inference. Documentation further clarifies the expert structure (128 total experts, 8 active per token) and the use of a shared expert. This level of detail prevents the common MoE 'parameter inflation' confusion and provides clear expectations for both memory (total params) and compute (active params).

Training Compute

2.0 / 10

Information regarding training compute is almost entirely absent. While documentation mentions the model can be fine-tuned on TPUs and H100s, there is no disclosure of the total GPU/TPU hours required for the initial pre-training, no hardware cluster specifications used for the primary run, and no carbon footprint or environmental impact calculations. This represents a significant transparency gap typical of proprietary-derived models.

Benchmark Reproducibility

6.0 / 10

Official benchmarks (MMLU-Pro: 82.4%, GSM8K: 94.1%) are provided with some methodological notes, such as the use of fixed temperature (0.1) and top-p (0.95) sampling. However, the evaluation code itself is not fully centralized in a reproducible repository, and specific few-shot prompts used for all frontier benchmarks are not exhaustively disclosed. Third-party verification on leaderboards like Open LLM Leaderboard and Arena AI provides some external validation, but the lack of a technical paper limits full reproducibility.

Identity Consistency

9.5 / 10

The model demonstrates high identity consistency, correctly identifying itself as a member of the Gemma 4 family and acknowledging its MoE architecture in system-level interactions. It is transparent about its versioning and its relationship to the Gemini research line. There are no documented instances of the model claiming to be a competitor's product or misrepresenting its parameter count in self-identification tasks.

Downstream

23.5 / 30

License Clarity

10.0 / 10

Gemma 4 marks a significant shift to a standard, OSI-approved Apache 2.0 license. This is a major transparency improvement over previous 'Gemma Terms of Use' licenses. The terms are clear, publicly accessible, and allow for unrestricted commercial use, modification, and redistribution without revenue caps or usage restrictions. The license is consistently applied across weights, code, and documentation.

Hardware Footprint

8.5 / 10

Hardware requirements are extensively documented for various quantization levels (FP16, Q8, Q4). Official and community documentation (e.g., Unsloth, vLLM) provides specific VRAM targets: ~18GB for 4-bit and ~60GB for BF16. Crucially, the documentation includes memory scaling data for the 256K context window, noting that the hybrid attention mechanism allows for more efficient VRAM usage at long contexts compared to standard dense models. Quantization trade-offs (e.g., <2.8% loss at 4-bit) are also disclosed.

Versioning Drift

5.0 / 10

The model uses basic versioning (Gemma 4 26B A4B), but a comprehensive, public changelog for weight updates or 'silent' fine-tuning refreshes is not maintained. While the release date is clear, there is no formal infrastructure for tracking behavioral drift over time or accessing specific 'checkpoint' versions beyond the initial release. This makes it difficult for developers to ensure long-term stability in production environments.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs

Gemma 4 26B A4B: Specifications and GPU VRAM Requirements