ApX 标志

趋近智

Qwen3-235B-A22B

活跃参数

235B

上下文长度

131.072K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Apache 2.0

发布日期

29 Apr 2025

知识截止

-

技术规格

专家参数总数

22.0B

专家数量

128

活跃专家

8

注意力结构

Grouped-Query Attention

隐藏维度大小

10240

层数

100

注意力头

128

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen3-235B-A22B

Qwen3-235B-A22B is a flagship Mixture-of-Experts (MoE) large language model developed by Alibaba Cloud, forming part of the Qwen3 series. Its primary purpose is to address high-performance computational linguistics tasks requiring advanced reasoning and comprehensive knowledge. This model is engineered for handling complex assignments such as sophisticated code generation, intricate mathematical problem-solving, and multi-step logical deduction. It is also designed to be highly effective in applications that necessitate processing of extended documents, managing multi-turn conversations, and analyzing enterprise-scale datasets.

The technical architecture of Qwen3-235B-A22B incorporates a unified framework that integrates both a 'thinking mode' and a 'non-thinking mode'. The thinking mode facilitates complex, multi-step reasoning by explicitly showing intermediate thought processes, while the non-thinking mode provides rapid, direct responses. This dual-mode design enables dynamic switching based on task complexity or user queries, allowing for adaptive allocation of computational resources during inference. The MoE architecture is characterized by its sparse activation mechanism, utilizing top-2 expert routing, where each input token is dynamically routed to its two most relevant experts chosen from a total of 128 experts. Despite a total parameter count of 235 billion, only 22 billion parameters are actively engaged during inference for any given input, contributing to efficiency. The model's foundation is built upon a pre-training corpus of approximately 36 trillion tokens, encompassing 119 languages and dialects. Architectural components include Grouped-Query Attention (GQA) for optimized attention mechanisms, Rotary Positional Embedding (RoPE) for position encoding, and the integration of Flash Attention for accelerated processing. Normalization is performed using pre-norm RMSNorm, and the activation function employed is SwiGLU.

The performance characteristics of Qwen3-235B-A22B highlight its capabilities in instruction following, logical reasoning, comprehensive text understanding, and proficiency across mathematics, science, and coding tasks. Its design prioritizes efficiency, with the MoE architecture significantly lowering the computational resources required per inference step, thereby reducing energy consumption and operational costs. The model supports a substantial context length, which enhances its ability to maintain coherence and retrieve relevant information over long sequences. The weights are made publicly available under the Apache 2.0 license, promoting widespread adoption and further research within the artificial intelligence community. This accessibility allows for deployment across various frameworks and platforms, including local development environments such as Ollama, LMStudio, and llama.cpp.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

排名

#12

基准分数排名

0.87

🥈

2

0.66

5

Web Development

WebDev Arena

1181.7

5

0.79

6

Agentic Coding

LiveBench Agentic

0.13

7

0.65

8

Professional Knowledge

MMLU Pro

0.68

13

Graduate-Level QA

GPQA

0.47

18

General Knowledge

MMLU

0.47

26

排名

排名

#12

编程排名

#16

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU

Qwen3-235B-A22B: Specifications and GPU VRAM Requirements