SmolLM3 3B: Specifications and GPU VRAM Requirements

SmolLM3 3B

开源

开放权重

参数

上下文长度

128K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

8 Jul 2025

训练数据截止日期

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

2048

层数

注意力头

键值头

激活函数

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

SmolLM3 3B

The SmolLM3-3B model, developed by Hugging Face, represents a compact yet highly capable large language model (LLM) within the 'Smol' family, specifically engineered for efficiency and performance in resource-constrained environments. This pretrained, open-weights base model integrates multilingual understanding, extended context processing, and dual-mode reasoning capabilities within a 3-billion-parameter footprint. Its design aims to democratize advanced AI by providing a powerful solution that can operate effectively on edge devices, mobile applications, and systems with limited computational resources. The model is part of a broader initiative to create lightweight yet impactful AI solutions, making sophisticated language understanding and generation more accessible.

Architecturally, SmolLM3-3B is a decoder-only Transformer model, building upon the foundational designs of models like Llama while incorporating specialized optimizations. Key innovations include the adoption of Grouped Query Attention (GQA), which utilizes 4 key-value heads to significantly reduce the KV cache size during inference without compromising performance, compared to traditional multi-head attention. It also features No Positional Encoding (NoPE), a modification where rotary position embeddings (RoPE) are selectively removed from every fourth layer, enhancing long-context performance. The model comprises 36 hidden layers with a hidden dimension size of 2048 and 16 attention heads. Input and output embeddings are tied to further reduce the memory footprint.

The training regimen for SmolLM3-3B involved a three-stage curriculum on an extensive 11.2 trillion tokens, drawing from diverse public datasets covering web content, code, mathematics, and reasoning data. This comprehensive pretraining establishes robust multilingual and general-purpose capabilities. The model's context length is natively 64,000 tokens, which is further extended to 128,000 tokens through YaRN extrapolation. SmolLM3-3B supports advanced functionalities such as tool calling using structured schemas (XML and Python tools), enabling its integration into complex agent workflows. Its design focuses on delivering competitive performance in areas like reasoning, knowledge retention, and multilingual tasks, positioning it for applications requiring efficient, high-quality language processing on various platforms.

关于 SmolLM Family

SmolLM open-weight language models (e.g. SmolLM3)

其他 SmolLM Family 模型

没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 SmolLM3 3B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

63k

125k

所需显存:

资源

官方文档发布说明下载权重源代码

SmolLM3 3B

技术规格

系统要求

SmolLM3 3B

关于 SmolLM Family

其他 SmolLM Family 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源