ApX logo

NVIDIA Nemotron 3 Nano 30B-A3B

Active Parameters

3.5B

Context Length

1,000K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

NVIDIA Open Model License

Release Date

15 Dec 2025

Knowledge Cutoff

Nov 2025

Technical Specifications

Total Expert Parameters

30.0B

Number of Experts

129

Active Experts

6

Attention Structure

Multi-Head Attention

Hidden Dimension Size

2688

Number of Layers

52

Attention Heads

32

Key-Value Heads

2

Activation Function

ReLU2

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

NVIDIA Nemotron 3 Nano 30B-A3B

NVIDIA Nemotron 3 Nano 30B-A3B is an advanced large language model meticulously developed by NVIDIA, integrating a hybrid Mixture-of-Experts (MoE) architecture with both Mamba-2 state-space model layers and Transformer attention layers. This sophisticated design is engineered to address the computational trade-offs traditionally associated with long-context processing while maintaining high accuracy across diverse tasks. The model aims to provide a unified solution for both explicit reasoning and general non-reasoning applications, with configurable capabilities to adapt its reasoning depth based on task requirements.

Architecturally, the Nemotron 3 Nano 30B-A3B comprises a total of 52 layers. This includes 23 Mamba-2 layers, which are particularly adept at efficient sequential processing and managing extended contexts, and 23 Mixture-of-Experts layers. Each MoE layer is structured with 128 routed experts augmented by 1 shared expert, and employs a mechanism that activates 6 experts per token during processing to enhance computational efficiency. Additionally, the model incorporates 6 Grouped-Query Attention (GQA) layers, providing robust attentional mechanisms for fine-grained information routing. The model utilizes a hidden dimension size of 2688, employs squared ReLU (ReLU2) as its activation function, and incorporates RMSNorm for normalization stability.

Designed for versatile deployment and robust performance, Nemotron 3 Nano 30B-A3B supports a substantial context length of up to 1 million tokens, enabling it to process extensive inputs for complex multi-step workflows, agentic systems, and retrieval-augmented generation (RAG) applications. The model is trained on an extensive corpus of approximately 25 trillion tokens, supporting multilingual interactions across English, Spanish, French, German, Italian, and Japanese, alongside numerous programming languages. This foundation positions the model as a capable component for building specialized AI agents, chatbots, and systems requiring efficient, accurate, and scalable language understanding and generation capabilities.

About Nemotron 3

Nemotron 3 is NVIDIA's family of open models delivering leading efficiency and accuracy for agentic AI applications. Utilizing hybrid Mamba-Transformer MoE architecture with Latent MoE design, the models support up to 1M token context and feature Multi-Token Prediction for improved generation efficiency. The Nano variant outperforms comparable models while maintaining extreme cost-efficiency.


Other Nemotron 3 Models
  • No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for NVIDIA Nemotron 3 Nano 30B-A3B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
488k
977k

VRAM Required:

Recommended GPUs