ApX 标志ApX 标志

趋近智

Qwen3-14B

参数

14B

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

29 Apr 2025

训练数据截止日期

Jan 2025

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

5120

层数

48

注意力头

80

键值头

8

激活函数

SwigLU

归一化

Layer Normalization

位置嵌入

ROPE

Qwen3-14B

Qwen3-14B is a dense transformer-based large language model developed by the Qwen team at Alibaba Cloud, designed as part of the third-generation Qwen series. A defining characteristic of this model is its native support for a hybrid reasoning architecture, allowing practitioners to toggle between a thinking mode for complex multi-step reasoning and a non-thinking mode for rapid conversational responses. This integration is managed via a system-level switching mechanism that utilizes specific chat templates or user-directed prompts to adjust the computational budget dynamically during inference. The thinking mode is specifically optimized for tasks requiring chain-of-thought processing, such as advanced mathematics, code generation, and logical deduction.

From a technical perspective, Qwen3-14B is built on a causal decoder-only architecture featuring 14.8 billion total parameters. It incorporates Grouped Query Attention (GQA) with 40 query heads and 8 key/value heads to improve inference throughput and reduce memory overhead. The model employs SwiGLU activation functions and RMSNorm with pre-normalization for enhanced training stability. For positional encoding, it utilizes Rotary Positional Embeddings (RoPE) with a base frequency adjusted to support long-context windows. While its native context length is 32,768 tokens, it is extendable to 131,072 tokens through the application of the YaRN (Yet another RoPE N) scaling technique.

Qwen3-14B is trained on an extensive multilingual corpus encompassing 119 languages and dialects, utilizing a three-stage pre-training pipeline that focuses on general knowledge acquisition, followed by reasoning enhancement and finally long-context fine-tuning. The model is natively compatible with the Model Context Protocol (MCP), enabling integration into agentic workflows for complex tool-calling and environment interaction. This design makes it a versatile solution for both interactive AI assistants and automated systems requiring a balance between analytical depth and operational efficiency.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

没有可用的 Qwen3-14B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
64k
128k

所需显存:

推荐 GPU

Qwen3-14B:规格和 GPU 显存要求