Qwen3 Next 80B A3B：规格和 GPU 显存要求

Qwen3 Next 80B A3B

开源

开放权重

活跃参数

80B

上下文长度

66K

模态

Reasoning

架构

Mixture of Experts (MoE)

许可证

Apache-2.0

发布日期

1 Feb 2026

训练数据截止日期

Jun 2025

技术规格

专家参数总数

79.0B

专家数量

512

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

2048

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

Absolute Position Embedding

Qwen3 Next 80B A3B

Qwen3-Next-80B-A3B is a high-capacity sparse Mixture-of-Experts (MoE) foundation model developed by Alibaba's Qwen team. It belongs to the next-generation Qwen3-Next series, specifically designed to address the computational demands of long-context sequence modeling and large-scale parameter efficiency. The model features a unique hybrid attention mechanism that integrates Gated DeltaNet with Gated Attention, allowing the system to maintain high performance across extended token sequences while significantly reducing the quadratic complexity typically associated with standard Transformer architectures.

The technical architecture employs a high-sparsity MoE layout consisting of 48 layers with a hidden dimension of 2048. While the model contains 80 billion total parameters, its gating mechanism activates only approximately 3 billion parameters per token during inference. This sparse activation strategy, combined with a total of 512 experts and a multi-token prediction (MTP) objective, facilitates improved throughput and reduced FLOPs per token. The model also incorporates stability-focused architectural refinements, such as zero-centered and weight-decayed layer normalization, to ensure robust convergence during both pre-training on 15 trillion tokens and subsequent reinforcement learning stages.

Optimized for complex reasoning and agentic workflows, Qwen3-Next-80B-A3B is capable of processing a native context window of 262,144 tokens, which can be extended to over 1 million tokens using specialized scaling techniques like YaRN. Its primary use cases include multi-step logical analysis, mathematical proofs, and code synthesis. By separating the 'Thinking' variant, which outputs structured reasoning traces, from the standard 'Instruct' variant, the model provides specialized paths for either high-efficiency general-purpose interaction or intensive, transparent problem-solving tasks.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

其他 Qwen 3 模型

评估基准

排名

#50

基准	分数	排名
Data Analysis LiveBench Data Analysis	0.73	⭐ 5
Professional Knowledge MMLU Pro	0.83	7
Web Development WebDev Arena	1402	18
Mathematics LiveBench Mathematics	0.74	22
Graduate-Level QA GPQA	0.77	23
Reasoning LiveBench Reasoning	0.55	24
Coding LiveBench Coding	0.68	30
Agentic Coding LiveBench Agentic	0.10	37

排名

#50

编程排名

#39

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

32k

64k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Qwen3 Next 80B A3B

技术规格

Qwen3 Next 80B A3B

关于 Qwen 3

其他 Qwen 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源