SmolLM3 3B: Specifications and GPU VRAM Requirements

SmolLM3 3B

Open Source

Open Weights

Parameters

Context Length

128K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

8 Jul 2025

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

2048

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

SmolLM3 3B

The SmolLM3-3B model, developed by Hugging Face, represents a compact yet highly capable large language model (LLM) within the 'Smol' family, specifically engineered for efficiency and performance in resource-constrained environments. This pretrained, open-weights base model integrates multilingual understanding, extended context processing, and dual-mode reasoning capabilities within a 3-billion-parameter footprint. Its design aims to democratize advanced AI by providing a powerful solution that can operate effectively on edge devices, mobile applications, and systems with limited computational resources. The model is part of a broader initiative to create lightweight yet impactful AI solutions, making sophisticated language understanding and generation more accessible.

Architecturally, SmolLM3-3B is a decoder-only Transformer model, building upon the foundational designs of models like Llama while incorporating specialized optimizations. Key innovations include the adoption of Grouped Query Attention (GQA), which utilizes 4 key-value heads to significantly reduce the KV cache size during inference without compromising performance, compared to traditional multi-head attention. It also features No Positional Encoding (NoPE), a modification where rotary position embeddings (RoPE) are selectively removed from every fourth layer, enhancing long-context performance. The model comprises 36 hidden layers with a hidden dimension size of 2048 and 16 attention heads. Input and output embeddings are tied to further reduce the memory footprint.

The training regimen for SmolLM3-3B involved a three-stage curriculum on an extensive 11.2 trillion tokens, drawing from diverse public datasets covering web content, code, mathematics, and reasoning data. This comprehensive pretraining establishes robust multilingual and general-purpose capabilities. The model's context length is natively 64,000 tokens, which is further extended to 128,000 tokens through YaRN extrapolation. SmolLM3-3B supports advanced functionalities such as tool calling using structured schemas (XML and Python tools), enabling its integration into complex agent workflows. Its design focuses on delivering competitive performance in areas like reasoning, knowledge retention, and multilingual tasks, positioning it for applications requiring efficient, high-quality language processing on various platforms.

About SmolLM Family

SmolLM open-weight language models (e.g. SmolLM3)

Other SmolLM Family Models

No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for SmolLM3 3B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Download Weights Source Code