趋近智
参数
1B
上下文长度
32.768K
模态
Text
架构
Dense
许可证
Gemma License
发布日期
12 Mar 2025
知识截止
Aug 2024
注意力结构
Grouped-Query Attention
隐藏维度大小
1536
层数
26
注意力头
16
键值头
4
激活函数
-
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Gemma 3 1B is a small language model (SLM) within the Gemma 3 family, developed by Google, designed for efficient deployment and operation on resource-constrained devices such as mobile phones and web applications. This model aims to enable local execution of AI capabilities, addressing concerns related to user data privacy and cloud inference costs. Its architecture is derived from the same research and technology that underpins the Gemini series of models, emphasizing state-of-the-art performance within a compact footprint.
Architecturally, Gemma 3 1B employs a decoder-only transformer design, which is optimized for autoregressive tasks such as text generation. A notable innovation in Gemma 3 is its interleaved attention mechanism, which integrates both global and local attention layers to enhance contextual comprehension across extended sequences. This allows the model to process longer documents by maintaining overall coherence while preserving fine-grained details within smaller sections. The 1B variant features a context window of 32,000 tokens, enabling it to handle substantial textual inputs. It utilizes a SentencePiece tokenizer with 262,000 entries and supports over 140 languages, facilitating diverse linguistic applications. Unlike its larger Gemma 3 counterparts, the 1B model is specialized for text-only processing and does not incorporate multimodal capabilities.
Gemma 3 1B is engineered for high throughput, demonstrating the capacity to process up to 2585 tokens per second, which enables rapid content processing. It is optimized for various hardware platforms, including NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs, ensuring broad compatibility. The model can operate effectively on devices with minimal memory, such as those with 4GB of RAM. Practical applications for Gemma 3 1B include generating descriptions from application data, creating context-aware dialogue for interactive characters, suggesting contextually relevant responses in messaging applications, and supporting question-answering systems for lengthy documents through integration with technologies like the AI Edge RAG SDK. It is provided with open weights, allowing developers to fine-tune and deploy it for specific project requirements.
Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.