Llama 3.2 1B: Specifications and GPU VRAM Requirements

Llama 3.2 1B

闭源

开放权重

参数

上下文长度

128K

模态

Text

架构

Dense

许可证

Llama 3.2 Community License

发布日期

25 Sept 2024

训练数据截止日期

Dec 2023

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

1024

层数

注意力头

键值头

激活函数

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Llama 3.2 1B

Meta Llama 3.2 1B is a foundational large language model developed by Meta, specifically optimized for deployment on edge and mobile devices. This model variant is designed for efficiency, enabling local execution of language processing tasks with reduced computational requirements. Its primary purpose is to facilitate on-device applications requiring natural language understanding and generation, making it suitable for environments with limited resources.

The model's architecture is based on an optimized transformer, a decoder-only structure that processes textual inputs and generates textual outputs. It employs Grouped-Query Attention (GQA) to enhance inference scalability, a technique that reduces memory bandwidth usage for key and value tensors by sharing them across multiple query heads. Positional encoding in the model utilizes Rotary Position Embeddings (RoPE), which integrate positional information into the attention mechanism. The Llama 3.2 1B model was trained on a substantial dataset of up to 9 trillion tokens derived from publicly available sources. Its development involved techniques such as pruning to reduce model size and knowledge distillation, where logits from larger Llama 3.1 models (8B and 70B) were incorporated during pre-training to recover and enhance performance.

This 1.23 billion parameter model supports a context length of 128,000 tokens, enabling it to process extensive input sequences for various applications. Typical use cases for the Llama 3.2 1B model include summarization, instruction following, rewriting tasks, personal information management, and multilingual knowledge retrieval directly on edge devices. It supports multiple languages for text generation, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

关于 Llama 3.2

Meta's Llama 3.2 family introduces vision models, integrating image encoders with language models for multimodal text and image processing. It also includes lightweight variants optimized for efficient on-device deployment, supporting an extended 128K token context length.

其他 Llama 3.2 模型

Llama 3.2 3B

评估基准

排名适用于本地LLM。

没有可用的 Llama 3.2 1B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档发布说明下载权重源代码

Llama 3.2 1B

技术规格

系统要求

Llama 3.2 1B

关于 Llama 3.2

其他 Llama 3.2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源