趋近智
活跃参数
3.5B
上下文长度
1,000K
模态
Text
架构
Mixture of Experts (MoE)
许可证
NVIDIA Open Model License
发布日期
15 Dec 2025
训练数据截止日期
Nov 2025
专家参数总数
30.0B
专家数量
129
活跃专家
6
注意力结构
Multi-Head Attention
隐藏维度大小
2688
层数
52
注意力头
32
键值头
2
激活函数
ReLU2
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
NVIDIA Nemotron 3 Nano 30B-A3B is an advanced large language model meticulously developed by NVIDIA, integrating a hybrid Mixture-of-Experts (MoE) architecture with both Mamba-2 state-space model layers and Transformer attention layers. This sophisticated design is engineered to address the computational trade-offs traditionally associated with long-context processing while maintaining high accuracy across diverse tasks. The model aims to provide a unified solution for both explicit reasoning and general non-reasoning applications, with configurable capabilities to adapt its reasoning depth based on task requirements.
Architecturally, the Nemotron 3 Nano 30B-A3B comprises a total of 52 layers. This includes 23 Mamba-2 layers, which are particularly adept at efficient sequential processing and managing extended contexts, and 23 Mixture-of-Experts layers. Each MoE layer is structured with 128 routed experts augmented by 1 shared expert, and employs a mechanism that activates 6 experts per token during processing to enhance computational efficiency. Additionally, the model incorporates 6 Grouped-Query Attention (GQA) layers, providing robust attentional mechanisms for fine-grained information routing. The model utilizes a hidden dimension size of 2688, employs squared ReLU (ReLU2) as its activation function, and incorporates RMSNorm for normalization stability.
Designed for versatile deployment and robust performance, Nemotron 3 Nano 30B-A3B supports a substantial context length of up to 1 million tokens, enabling it to process extensive inputs for complex multi-step workflows, agentic systems, and retrieval-augmented generation (RAG) applications. The model is trained on an extensive corpus of approximately 25 trillion tokens, supporting multilingual interactions across English, Spanish, French, German, Italian, and Japanese, alongside numerous programming languages. This foundation positions the model as a capable component for building specialized AI agents, chatbots, and systems requiring efficient, accurate, and scalable language understanding and generation capabilities.
Nemotron 3 is NVIDIA's family of open models delivering leading efficiency and accuracy for agentic AI applications. Utilizing hybrid Mamba-Transformer MoE architecture with Latent MoE design, the models support up to 1M token context and feature Multi-Token Prediction for improved generation efficiency. The Nano variant outperforms comparable models while maintaining extreme cost-efficiency.
排名适用于本地LLM。
没有可用的 NVIDIA Nemotron 3 Nano 30B-A3B 评估基准。