ApX 标志

趋近智

Gemma 1 2B

参数

2B

上下文长度

8.192K

模态

Text

架构

Dense

许可证

Gemma Terms of Use

发布日期

21 Feb 2024

知识截止

-

技术规格

注意力结构

Multi-Query Attention

隐藏维度大小

2048

层数

18

注意力头

16

键值头

1

激活函数

-

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Gemma 1 2B

Gemma 1 2B is a lightweight, state-of-the-art open language model developed by Google, stemming from the same research and technology that underpins the Gemini family of models. This model is designed as a text-to-text, decoder-only transformer, primarily available in English, with both pre-trained and instruction-tuned variants. Its architectural design focuses on efficiency, making it suitable for deployment in environments with limited computational resources, such as laptops, desktops, or personal cloud infrastructure.

Architecturally, Gemma 1 2B incorporates several advanced components. It utilizes Multi-Query Attention (MQA) with a single key-value head, a design choice that optimizes for faster inference by sharing key and value projections across attention heads. Positional encoding is handled through Rotary Positional Embeddings (RoPE). The model's non-linear activation function is GeGLU (Gated Linear Unit), a variant of GLU that enhances expressive power. Normalization within the network is performed using RMSNorm. These elements contribute to the model's performance while maintaining a compact footprint.

The 2B variant is well-suited for a variety of text generation applications, including question answering, summarization, and reasoning tasks. The instruction-tuned versions of Gemma 1 2B are specifically refined to follow instructions effectively and engage in multi-turn conversations, making them adaptable for interactive applications like chatbots. Its compact size ensures it can operate on consumer-grade hardware, democratizing access to advanced AI capabilities for developers and researchers.

关于 Gemma 1

Gemma 1 is a family of lightweight, decoder-only transformer models from Google, available in 2B and 7B parameter sizes. Designed for various text generation tasks, they incorporate rotary positional embeddings, shared input/output embeddings, GEGLU activation, and RMSNorm. The 2B model uses multi-query attention, while 7B uses multi-head attention.


其他 Gemma 1 模型

评估基准

排名适用于本地LLM。

没有可用的 Gemma 1 2B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
4k
8k

所需显存:

推荐 GPU