趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
2048
层数
36
注意力头
16
键值头
4
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
The SmolLM3-3B model, developed by Hugging Face, represents a compact yet highly capable large language model (LLM) within the 'Smol' family, specifically engineered for efficiency and performance in resource-constrained environments. This pretrained, open-weights base model integrates multilingual understanding, extended context processing, and dual-mode reasoning capabilities within a 3-billion-parameter footprint. Its design aims to democratize advanced AI by providing a powerful solution that can operate effectively on edge devices, mobile applications, and systems with limited computational resources. The model is part of a broader initiative to create lightweight yet impactful AI solutions, making sophisticated language understanding and generation more accessible.
Architecturally, SmolLM3-3B is a decoder-only Transformer model, building upon the foundational designs of models like Llama while incorporating specialized optimizations. Key innovations include the adoption of Grouped Query Attention (GQA), which utilizes 4 key-value heads to significantly reduce the KV cache size during inference without compromising performance, compared to traditional multi-head attention. It also features No Positional Encoding (NoPE), a modification where rotary position embeddings (RoPE) are selectively removed from every fourth layer, enhancing long-context performance. The model comprises 36 hidden layers with a hidden dimension size of 2048 and 16 attention heads. Input and output embeddings are tied to further reduce the memory footprint.
The training regimen for SmolLM3-3B involved a three-stage curriculum on an extensive 11.2 trillion tokens, drawing from diverse public datasets covering web content, code, mathematics, and reasoning data. This comprehensive pretraining establishes robust multilingual and general-purpose capabilities. The model's context length is natively 64,000 tokens, which is further extended to 128,000 tokens through YaRN extrapolation. SmolLM3-3B supports advanced functionalities such as tool calling using structured schemas (XML and Python tools), enabling its integration into complex agent workflows. Its design focuses on delivering competitive performance in areas like reasoning, knowledge retention, and multilingual tasks, positioning it for applications requiring efficient, high-quality language processing on various platforms.
SmolLM open-weight language models (e.g. SmolLM3)
排名适用于本地LLM。
没有可用的 SmolLM3 3B 评估基准。