Gemma 3n E2B IT: Specifications and GPU VRAM Requirements

Gemma 3n E2B IT

闭源

开放权重

活跃参数

上下文长度

32.768K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Google Gemma License

发布日期

20 May 2025

训练数据截止日期

Jun 2024

技术规格

专家参数总数

2.0B

专家数量

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

2560

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Gemma 3n E2B IT

Gemma 3n E2B IT is a member of the Google Gemma 3n model family, engineered for efficient deployment and execution on resource-constrained devices, including mobile phones, laptops, and workstations. This model is designed to facilitate highly capable, real-time artificial intelligence inference directly at the edge. The E2B variant is specifically instruction-tuned for diverse applications.

The architectural foundation of Gemma 3n E2B IT is the Matryoshka Transformer, or MatFormer. A central innovation in this architecture is the implementation of selective parameter activation technology. This enables the model to operate with an effective memory footprint of approximately 2 billion parameters, even though the total number of parameters loaded during standard execution is 6 billion. This flexible parameter management allows for dynamic optimization of performance relative to computational resources. Furthermore, the model incorporates multimodal understanding capabilities, processing not only textual input but also images, video, and audio to generate textual outputs. For visual data, it employs a SigLIP vision encoder, which integrates a "Pan & Scan" algorithm to robustly handle varying image resolutions and aspect ratios. The attention mechanism within the model is structured with an interleaved pattern, alternating between five local layers, each utilizing a constrained sliding window of 1024 tokens, and one global layer. This design optimizes Key-Value (KV) cache management, which is essential for efficient processing of long contexts. Positional encoding is managed through Rotary Position Embeddings (RoPE), and the model leverages Grouped-Query Attention (GQA) along with RMSNorm for normalization.

In terms of operational characteristics, Gemma 3n E2B IT supports a context length of 32,768 tokens. It features comprehensive multilingual capabilities, having been trained on data encompassing over 140 languages, and utilizes a tokenizer optimized for broad language coverage. The model is applicable to a range of generative AI tasks, including question answering, summarization, and reasoning. Its efficient architecture makes it particularly suitable for integration into systems requiring low-resource deployment, such as content analysis tools, automated documentation systems, and interactive multimodal assistants. The model also supports function calling, enabling the construction of natural language interfaces for programmatic control.

关于 Gemma 3

Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.

其他 Gemma 3 模型

评估基准

排名适用于本地LLM。

排名

#55

基准	分数	排名
Professional Knowledge MMLU Pro	0.41	6
Agentic Coding LiveBench Agentic	0.02	18
Graduate-Level QA GPQA	0.25	29
Coding LiveBench Coding	0.16	30
Mathematics LiveBench Mathematics	0.26	31
Reasoning LiveBench Reasoning	0.20	32
General Knowledge MMLU	0.25	41

排名

#55

编程排名

#43

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档阅读论文下载权重

Gemma 3n E2B IT

技术规格

系统要求

Gemma 3n E2B IT

关于 Gemma 3

其他 Gemma 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源