Active Parameters
6B
Context Length
32.768K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Google Gemma License
Release Date
20 May 2025
Knowledge Cutoff
Jun 2024
Total Expert Parameters
2.0B
Number of Experts
-
Active Experts
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
2560
Number of Layers
30
Attention Heads
-
Key-Value Heads
-
Activation Function
-
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Gemma 3n E2B IT is a member of the Google Gemma 3n model family, engineered for efficient deployment and execution on resource-constrained devices, including mobile phones, laptops, and workstations. This model is designed to facilitate highly capable, real-time artificial intelligence inference directly at the edge. The E2B variant is specifically instruction-tuned for diverse applications.
The architectural foundation of Gemma 3n E2B IT is the Matryoshka Transformer, or MatFormer. A central innovation in this architecture is the implementation of selective parameter activation technology. This enables the model to operate with an effective memory footprint of approximately 2 billion parameters, even though the total number of parameters loaded during standard execution is 6 billion. This flexible parameter management allows for dynamic optimization of performance relative to computational resources. Furthermore, the model incorporates multimodal understanding capabilities, processing not only textual input but also images, video, and audio to generate textual outputs. For visual data, it employs a SigLIP vision encoder, which integrates a "Pan & Scan" algorithm to robustly handle varying image resolutions and aspect ratios. The attention mechanism within the model is structured with an interleaved pattern, alternating between five local layers, each utilizing a constrained sliding window of 1024 tokens, and one global layer. This design optimizes Key-Value (KV) cache management, which is essential for efficient processing of long contexts. Positional encoding is managed through Rotary Position Embeddings (RoPE), and the model leverages Grouped-Query Attention (GQA) along with RMSNorm for normalization.
In terms of operational characteristics, Gemma 3n E2B IT supports a context length of 32,768 tokens. It features comprehensive multilingual capabilities, having been trained on data encompassing over 140 languages, and utilizes a tokenizer optimized for broad language coverage. The model is applicable to a range of generative AI tasks, including question answering, summarization, and reasoning. Its efficient architecture makes it particularly suitable for integration into systems requiring low-resource deployment, such as content analysis tools, automated documentation systems, and interactive multimodal assistants. The model also supports function calling, enabling the construction of natural language interfaces for programmatic control.
Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.
Ranking is for Local LLMs.
Rank
#51
Benchmark | Score | Rank |
---|---|---|
Agentic Coding LiveBench Agentic | 0.02 | 19 |
Professional Knowledge MMLU Pro | 0.41 | 26 |
Coding LiveBench Coding | 0.16 | 29 |
Mathematics LiveBench Mathematics | 0.26 | 29 |
Graduate-Level QA GPQA | 0.25 | 29 |
Reasoning LiveBench Reasoning | 0.20 | 30 |
General Knowledge MMLU | 0.25 | 37 |
Overall Rank
#51
Coding Rank
#39
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens