ApX 标志

趋近智

NVIDIA Nemotron 3 Nano 30B-A3B

活跃参数

3.5B

上下文长度

1,000K

模态

Text

架构

Mixture of Experts (MoE)

许可证

NVIDIA Open Model License

发布日期

15 Dec 2025

训练数据截止日期

Nov 2025

技术规格

专家参数总数

30.0B

专家数量

129

活跃专家

6

注意力结构

Multi-Head Attention

隐藏维度大小

2688

层数

52

注意力头

32

键值头

2

激活函数

ReLU2

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

NVIDIA Nemotron 3 Nano 30B-A3B

NVIDIA Nemotron 3 Nano 30B-A3B is an advanced large language model meticulously developed by NVIDIA, integrating a hybrid Mixture-of-Experts (MoE) architecture with both Mamba-2 state-space model layers and Transformer attention layers. This sophisticated design is engineered to address the computational trade-offs traditionally associated with long-context processing while maintaining high accuracy across diverse tasks. The model aims to provide a unified solution for both explicit reasoning and general non-reasoning applications, with configurable capabilities to adapt its reasoning depth based on task requirements.

Architecturally, the Nemotron 3 Nano 30B-A3B comprises a total of 52 layers. This includes 23 Mamba-2 layers, which are particularly adept at efficient sequential processing and managing extended contexts, and 23 Mixture-of-Experts layers. Each MoE layer is structured with 128 routed experts augmented by 1 shared expert, and employs a mechanism that activates 6 experts per token during processing to enhance computational efficiency. Additionally, the model incorporates 6 Grouped-Query Attention (GQA) layers, providing robust attentional mechanisms for fine-grained information routing. The model utilizes a hidden dimension size of 2688, employs squared ReLU (ReLU2) as its activation function, and incorporates RMSNorm for normalization stability.

Designed for versatile deployment and robust performance, Nemotron 3 Nano 30B-A3B supports a substantial context length of up to 1 million tokens, enabling it to process extensive inputs for complex multi-step workflows, agentic systems, and retrieval-augmented generation (RAG) applications. The model is trained on an extensive corpus of approximately 25 trillion tokens, supporting multilingual interactions across English, Spanish, French, German, Italian, and Japanese, alongside numerous programming languages. This foundation positions the model as a capable component for building specialized AI agents, chatbots, and systems requiring efficient, accurate, and scalable language understanding and generation capabilities.

关于 Nemotron 3

Nemotron 3 is NVIDIA's family of open models delivering leading efficiency and accuracy for agentic AI applications. Utilizing hybrid Mamba-Transformer MoE architecture with Latent MoE design, the models support up to 1M token context and feature Multi-Token Prediction for improved generation efficiency. The Nano variant outperforms comparable models while maintaining extreme cost-efficiency.


其他 Nemotron 3 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 NVIDIA Nemotron 3 Nano 30B-A3B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
488k
977k

所需显存:

推荐 GPU