Parameters
3B
Context Length
128K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
8 Jul 2025
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
2048
Number of Layers
36
Attention Heads
16
Key-Value Heads
4
Activation Function
-
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
The SmolLM3-3B model, developed by Hugging Face, represents a compact yet highly capable large language model (LLM) within the 'Smol' family, specifically engineered for efficiency and performance in resource-constrained environments. This pretrained, open-weights base model integrates multilingual understanding, extended context processing, and dual-mode reasoning capabilities within a 3-billion-parameter footprint. Its design aims to democratize advanced AI by providing a powerful solution that can operate effectively on edge devices, mobile applications, and systems with limited computational resources. The model is part of a broader initiative to create lightweight yet impactful AI solutions, making sophisticated language understanding and generation more accessible.
Architecturally, SmolLM3-3B is a decoder-only Transformer model, building upon the foundational designs of models like Llama while incorporating specialized optimizations. Key innovations include the adoption of Grouped Query Attention (GQA), which utilizes 4 key-value heads to significantly reduce the KV cache size during inference without compromising performance, compared to traditional multi-head attention. It also features No Positional Encoding (NoPE), a modification where rotary position embeddings (RoPE) are selectively removed from every fourth layer, enhancing long-context performance. The model comprises 36 hidden layers with a hidden dimension size of 2048 and 16 attention heads. Input and output embeddings are tied to further reduce the memory footprint.
The training regimen for SmolLM3-3B involved a three-stage curriculum on an extensive 11.2 trillion tokens, drawing from diverse public datasets covering web content, code, mathematics, and reasoning data. This comprehensive pretraining establishes robust multilingual and general-purpose capabilities. The model's context length is natively 64,000 tokens, which is further extended to 128,000 tokens through YaRN extrapolation. SmolLM3-3B supports advanced functionalities such as tool calling using structured schemas (XML and Python tools), enabling its integration into complex agent workflows. Its design focuses on delivering competitive performance in areas like reasoning, knowledge retention, and multilingual tasks, positioning it for applications requiring efficient, high-quality language processing on various platforms.
SmolLM open-weight language models (e.g. SmolLM3)
Ranking is for Local LLMs.
No evaluation benchmarks for SmolLM3 3B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens