NVIDIA Nemotron 3 Nano 30B-A3B：规格和 GPU 显存要求

NVIDIA Nemotron 3 Nano 30B-A3B

开源

开放权重

活跃参数

3.5B

上下文长度

1,000K

模态

Text

架构

Mixture of Experts (MoE)

许可证

NVIDIA Open Model License

发布日期

15 Dec 2025

训练数据截止日期

Nov 2025

技术规格

专家参数总数

30.0B

专家数量

129

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

2688

层数

注意力头

键值头

激活函数

ReLU2

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

NVIDIA Nemotron 3 Nano 30B-A3B

NVIDIA Nemotron 3 Nano 30B-A3B is an advanced large language model meticulously developed by NVIDIA, integrating a hybrid Mixture-of-Experts (MoE) architecture with both Mamba-2 state-space model layers and Transformer attention layers. This sophisticated design is engineered to address the computational trade-offs traditionally associated with long-context processing while maintaining high accuracy across diverse tasks. The model aims to provide a unified solution for both explicit reasoning and general non-reasoning applications, with configurable capabilities to adapt its reasoning depth based on task requirements.

Architecturally, the Nemotron 3 Nano 30B-A3B comprises a total of 52 layers. This includes 23 Mamba-2 layers, which are particularly adept at efficient sequential processing and managing extended contexts, and 23 Mixture-of-Experts layers. Each MoE layer is structured with 128 routed experts augmented by 1 shared expert, and employs a mechanism that activates 6 experts per token during processing to enhance computational efficiency. Additionally, the model incorporates 6 Grouped-Query Attention (GQA) layers, providing robust attentional mechanisms for fine-grained information routing. The model utilizes a hidden dimension size of 2688, employs squared ReLU (ReLU2) as its activation function, and incorporates RMSNorm for normalization stability.

Designed for versatile deployment and robust performance, Nemotron 3 Nano 30B-A3B supports a substantial context length of up to 1 million tokens, enabling it to process extensive inputs for complex multi-step workflows, agentic systems, and retrieval-augmented generation (RAG) applications. The model is trained on an extensive corpus of approximately 25 trillion tokens, supporting multilingual interactions across English, Spanish, French, German, Italian, and Japanese, alongside numerous programming languages. This foundation positions the model as a capable component for building specialized AI agents, chatbots, and systems requiring efficient, accurate, and scalable language understanding and generation capabilities.

关于 Nemotron 3

Nemotron 3 is NVIDIA's family of open models delivering leading efficiency and accuracy for agentic AI applications. Utilizing hybrid Mamba-Transformer MoE architecture with Latent MoE design, the models support up to 1M token context and feature Multi-Token Prediction for improved generation efficiency. The Nano variant outperforms comparable models while maintaining extreme cost-efficiency.

其他 Nemotron 3 模型

没有相关模型

评估基准

排名

#65

基准	分数	排名
Professional Knowledge MMLU Pro	0.78	15
Web Development WebDev Arena	1317	38

排名

#65

编程排名

#53

模型透明度

总分

上游

24.5 / 30

模型

31.0 / 40

下游

21.0 / 30

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

488k

977k

所需显存:

资源

官方文档阅读论文下载权重源代码

NVIDIA Nemotron 3 Nano 30B-A3B

技术规格

NVIDIA Nemotron 3 Nano 30B-A3B

关于 Nemotron 3

其他 Nemotron 3 模型

评估基准

排名

模型透明度

GPU 要求

所需显存:

推荐 GPU

资源