Gemma 3 1B: Specifications and GPU VRAM Requirements

Gemma 3 1B

闭源

开放权重

参数

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Gemma License

发布日期

12 Mar 2025

训练数据截止日期

Aug 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

1536

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Gemma 3 1B

Gemma 3 1B is a small language model (SLM) within the Gemma 3 family, developed by Google, designed for efficient deployment and operation on resource-constrained devices such as mobile phones and web applications. This model aims to enable local execution of AI capabilities, addressing concerns related to user data privacy and cloud inference costs. Its architecture is derived from the same research and technology that underpins the Gemini series of models, emphasizing state-of-the-art performance within a compact footprint.

Architecturally, Gemma 3 1B employs a decoder-only transformer design, which is optimized for autoregressive tasks such as text generation. A notable innovation in Gemma 3 is its interleaved attention mechanism, which integrates both global and local attention layers to enhance contextual comprehension across extended sequences. This allows the model to process longer documents by maintaining overall coherence while preserving fine-grained details within smaller sections. The 1B variant features a context window of 32,000 tokens, enabling it to handle substantial textual inputs. It utilizes a SentencePiece tokenizer with 262,000 entries and supports over 140 languages, facilitating diverse linguistic applications. Unlike its larger Gemma 3 counterparts, the 1B model is specialized for text-only processing and does not incorporate multimodal capabilities.

Gemma 3 1B is engineered for high throughput, demonstrating the capacity to process up to 2585 tokens per second, which enables rapid content processing. It is optimized for various hardware platforms, including NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs, ensuring broad compatibility. The model can operate effectively on devices with minimal memory, such as those with 4GB of RAM. Practical applications for Gemma 3 1B include generating descriptions from application data, creating context-aware dialogue for interactive characters, suggesting contextually relevant responses in messaging applications, and supporting question-answering systems for lengthy documents through integration with technologies like the AI Edge RAG SDK. It is provided with open weights, allowing developers to fine-tune and deploy it for specific project requirements.

关于 Gemma 3

Gemma 3 is a family of open, lightweight models from Google. It introduces multimodal image and text processing, supports over 140 languages, and features extended context windows up to 128K tokens. Models are available in multiple parameter sizes for diverse applications.

其他 Gemma 3 模型

评估基准

排名适用于本地LLM。

排名

#51

基准	分数	排名
Professional Knowledge MMLU Pro	0.15	7
Graduate-Level QA GPQA	0.19	30
General Knowledge MMLU	0.19	42

排名

#51

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档发布说明阅读论文下载权重

Gemma 3 1B

技术规格

系统要求

Gemma 3 1B

关于 Gemma 3

其他 Gemma 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源