Llama 3.2 1B: Specifications and GPU VRAM Requirements

Llama 3.2 1B

Closed Source

Open Weights

Parameters

Context Length

128K

Modality

Text

Architecture

Dense

License

Llama 3.2 Community License

Release Date

25 Sept 2024

Knowledge Cutoff

Dec 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

1024

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Llama 3.2 1B

Meta Llama 3.2 1B is a foundational large language model developed by Meta, specifically optimized for deployment on edge and mobile devices. This model variant is designed for efficiency, enabling local execution of language processing tasks with reduced computational requirements. Its primary purpose is to facilitate on-device applications requiring natural language understanding and generation, making it suitable for environments with limited resources.

The model's architecture is based on an optimized transformer, a decoder-only structure that processes textual inputs and generates textual outputs. It employs Grouped-Query Attention (GQA) to enhance inference scalability, a technique that reduces memory bandwidth usage for key and value tensors by sharing them across multiple query heads. Positional encoding in the model utilizes Rotary Position Embeddings (RoPE), which integrate positional information into the attention mechanism. The Llama 3.2 1B model was trained on a substantial dataset of up to 9 trillion tokens derived from publicly available sources. Its development involved techniques such as pruning to reduce model size and knowledge distillation, where logits from larger Llama 3.1 models (8B and 70B) were incorporated during pre-training to recover and enhance performance.

This 1.23 billion parameter model supports a context length of 128,000 tokens, enabling it to process extensive input sequences for various applications. Typical use cases for the Llama 3.2 1B model include summarization, instruction following, rewriting tasks, personal information management, and multilingual knowledge retrieval directly on edge devices. It supports multiple languages for text generation, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

About Llama 3.2

Meta's Llama 3.2 family introduces vision models, integrating image encoders with language models for multimodal text and image processing. It also includes lightweight variants optimized for efficient on-device deployment, supporting an extended 128K token context length.

Other Llama 3.2 Models

Llama 3.2 3B

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Llama 3.2 1B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code