趋近智
活跃参数
6B
上下文长度
32.768K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Google Gemma License
发布日期
20 May 2025
知识截止
Jun 2024
专家参数总数
2.0B
专家数量
-
活跃专家
-
注意力结构
Multi-Head Attention
隐藏维度大小
2560
层数
30
注意力头
-
键值头
-
激活函数
-
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Gemma 3n E2B IT is a member of the Google Gemma 3n model family, engineered for efficient deployment and execution on resource-constrained devices, including mobile phones, laptops, and workstations. This model is designed to facilitate highly capable, real-time artificial intelligence inference directly at the edge. The E2B variant is specifically instruction-tuned for diverse applications.
The architectural foundation of Gemma 3n E2B IT is the Matryoshka Transformer, or MatFormer. A central innovation in this architecture is the implementation of selective parameter activation technology. This enables the model to operate with an effective memory footprint of approximately 2 billion parameters, even though the total number of parameters loaded during standard execution is 6 billion. This flexible parameter management allows for dynamic optimization of performance relative to computational resources. Furthermore, the model incorporates multimodal understanding capabilities, processing not only textual input but also images, video, and audio to generate textual outputs. For visual data, it employs a SigLIP vision encoder, which integrates a "Pan & Scan" algorithm to robustly handle varying image resolutions and aspect ratios. The attention mechanism within the model is structured with an interleaved pattern, alternating between five local layers, each utilizing a constrained sliding window of 1024 tokens, and one global layer. This design optimizes Key-Value (KV) cache management, which is essential for efficient processing of long contexts. Positional encoding is managed through Rotary Position Embeddings (RoPE), and the model leverages Grouped-Query Attention (GQA) along with RMSNorm for normalization.
In terms of operational characteristics, Gemma 3n E2B IT supports a context length of 32,768 tokens. It features comprehensive multilingual capabilities, having been trained on data encompassing over 140 languages, and utilizes a tokenizer optimized for broad language coverage. The model is applicable to a range of generative AI tasks, including question answering, summarization, and reasoning. Its efficient architecture makes it particularly suitable for integration into systems requiring low-resource deployment, such as content analysis tools, automated documentation systems, and interactive multimodal assistants. The model also supports function calling, enabling the construction of natural language interfaces for programmatic control.
Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.
排名适用于本地LLM。
排名
#51
基准 | 分数 | 排名 |
---|---|---|
Agentic Coding LiveBench Agentic | 0.02 | 19 |
Professional Knowledge MMLU Pro | 0.41 | 26 |
Coding LiveBench Coding | 0.16 | 29 |
Mathematics LiveBench Mathematics | 0.26 | 29 |
Graduate-Level QA GPQA | 0.25 | 29 |
Reasoning LiveBench Reasoning | 0.20 | 30 |
General Knowledge MMLU | 0.25 | 37 |