Gemma 4 E4B

Open Source

Open Weights

Parameters

Context Length

128K

Modality

Multimodal

Architecture

Dense

License

Apache 2.0

Release Date

2 Apr 2026

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

18.39 GB VRAM

Consumer

1x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

128,000 tokens

29.86 GB VRAM

Consumer

2x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

No evaluation benchmarks for Gemma 4 E4B available.

Rankings

Overall Rank

Coding Rank

About Gemma 4 E4B

Gemma 4 E4B is an edge-optimized model with 4.5B effective parameters (8B with Per-Layer Embeddings) for mobile and edge deployments. Supports multimodal input (text, image, audio) with 128K context window. Delivers enhanced performance over E2B while maintaining efficient on-device execution. Features thinking mode and native function calling.

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000

Sliding Window Attention

Yes

Sliding Window Size

512

Sliding Window Ratio

83.3%

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

10,240

Number of Layers

FFN Intermediate Size (Dense)

10,240

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

262,144

Model Integrity

Total Score

68 / 100

Upstream

20.0 / 30

Model

24.5 / 40

Downstream

23.5 / 30

Gemma 4 E4B Model Integrity Report

Total Score

/ 100

Audit Note

Gemma 4 E4B exhibits a bifurcated transparency profile, offering industry-leading clarity in licensing (Apache 2.0) and hardware requirements while remaining highly opaque regarding its training data and compute resources. The model's architectural documentation is technically detailed, particularly concerning its 'effective parameter' mechanism, but the reliance on knowledge distillation from undisclosed teacher models and the absence of a formal technical paper hinder full verification.

Upstream

20.0 / 30

Architectural Provenance

7.5 / 10

Gemma 4 E4B is explicitly documented as a decoder-only transformer derived from Google's Gemini 3 research. The architecture is detailed as a hybrid design interleaving local sliding-window attention (512-token window) with global full-context attention. It utilizes a novel 'Per-Layer Embeddings' (PLE) technique where each decoder layer has its own embedding signal, allowing for a total parameter count of ~8B while maintaining an 'effective' compute footprint of 4.5B. Other documented features include RMSNorm, RoPE, and logit soft-capping. However, while the methodology is described, the specific 'Teacher' model used for its knowledge distillation process is not explicitly named beyond the 'Gemini family'.

Dataset Composition

3.5 / 10

Google provides very limited transparency regarding the specific training data for Gemma 4. Official documentation states it is trained on a 'large collection of different datasets' and mentions the use of knowledge distillation from a larger teacher model. While it claims support for 140+ languages and multimodal inputs (text, image, audio), there is no public breakdown of dataset proportions (e.g., % web, % code), no disclosure of specific data sources, and no detailed documentation on filtering or cleaning methodologies. The lack of a technical paper at launch further obscures data provenance.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly accessible via the official Hugging Face repository and GitHub. It uses a vocabulary size of 256,000 (often cited as 262,144 including special tokens), which is consistent across the Gemma 4 family. The tokenizer supports 140+ languages and includes dedicated special tokens for its 'thinking mode' (<|think|>) and native function calling. Technical details regarding the multimodal tokenization (e.g., 16x16 patches for images and mel-spectrograms for audio) are well-documented in technical blogs and model cards.

Model

24.5 / 40

Parameter Density

8.0 / 10

Google is transparent about the distinction between 'effective' and 'total' parameters for the E4B variant. It is clearly stated that the model has ~8.0B total parameters but operates with 4.5B active/effective parameters during inference due to the PLE architecture. This is a significant improvement over typical marketing which might only cite the lower number. The architectural breakdown (42 layers) is available in model cards, though the exact impact of PLE on quantization-specific parameter density is less detailed.

Training Compute

2.0 / 10

There is almost no verifiable information regarding the training compute for Gemma 4 E4B. Google has not disclosed GPU/TPU hours, specific hardware clusters used for training, or the carbon footprint. While it mentions support for Trillium and Ironwood TPUs for inference/fine-tuning, the actual pre-training resources remain proprietary. This lack of disclosure is a significant transparency gap.

Benchmark Reproducibility

5.0 / 10

Google provides a range of benchmark results (MMLU Pro: 69.4%, AIME 2026: 42.5%) in its official model card and blog. However, the evaluation code is not fully public, and exact prompts or few-shot examples used for these specific scores are not detailed. While third-party entities like Artificial Analysis have begun independent testing, the absence of a formal technical paper with detailed reproduction instructions limits the score.

Identity Consistency

9.5 / 10

The model demonstrates high identity consistency, correctly identifying itself as a member of the Gemma 4 family. It is transparent about its versioning (E4B vs E2B) and its specific capabilities, such as the 'thinking mode' and multimodal support. There are no reported issues of the model claiming to be a competitor (e.g., GPT-4) or misrepresenting its nature as an AI.

Downstream

23.5 / 30

License Clarity

10.0 / 10

Gemma 4 marks a major shift for Google by adopting the standard, OSI-approved Apache 2.0 license. This provides exemplary transparency and legal certainty, allowing for unrestricted commercial use, modification, and distribution without the custom 'Gemma Terms of Use' restrictions found in previous versions. The license is clearly stated on Hugging Face, GitHub, and official announcements.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. Official and third-party guides (Ollama, Unsloth, vLLM) provide specific VRAM requirements for various quantization levels (e.g., ~5.5GB for 4-bit, ~15GB for BF16). Documentation also covers the memory scaling for its 128K context window and the impact of its multimodal encoders (vision/audio) on VRAM usage, providing clear guidance for edge deployment.

Versioning Drift

5.0 / 10

The model uses clear naming conventions (Gemma 4 E4B) and provides both base and instruction-tuned variants. However, as a new release, there is no established changelog or history of semantic versioning for weight updates. While the initial release is well-documented, the long-term commitment to tracking and disclosing behavioral drift or silent updates remains to be proven.

Resources

Official Documentation Download Weights Source Code

About Gemma 4

Gemma 4 is Google DeepMind's most advanced open model family, built from Gemini 3 research and technology. Featuring both Dense and Mixture-of-Experts (MoE) architectures, these multimodal models handle text, images, and audio (on smaller variants), with context windows up to 256K tokens. Designed for frontier-level performance across reasoning, coding, and agentic workflows, Gemma 4 delivers unprecedented intelligence-per-parameter from mobile devices to enterprise servers. Released under Apache 2.0 license.

Gemma 4 E4B

System Requirements

Architecture Diagram

Evaluation Benchmarks

Rankings

About Gemma 4 E4B

Technical Specifications

Model Integrity

Gemma 4 E4B Model Integrity Report

Audit Note

Upstream

Model

Downstream

Resources

About Gemma 4

Other Gemma 4 Models