ApX logo

Qwen3-14B

Parameters

14B

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

29 Apr 2025

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

-

Number of Layers

48

Attention Heads

80

Key-Value Heads

8

Activation Function

-

Normalization

Layer Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen3-14B

Qwen3-14B is a causal language model developed by the Qwen team at Alibaba Cloud, integrated within the Qwen3 series. This model features a dense architecture, comprising 14.8 billion parameters. A key design element is its ability for dynamic mode switching, allowing operation in a "thinking" mode for complex analytical tasks and a "non-thinking" mode for general-purpose dialogue. This dual capability aims to optimize utility across a broad range of natural language processing applications, providing enhanced reasoning for mathematics, code generation, and logical inference in thinking mode, and efficient responses for general dialogue and content generation in non-thinking mode.

Architecturally, Qwen3-14B incorporates a Grouped Query Attention (GQA) mechanism, configured with 40 query heads and 8 key/value heads, which contributes to its computational efficiency. The model is structured with 40 layers. It supports a native context length of 32,768 tokens, expandable to 131,072 tokens through the application of the YaRN (Yet another RoPE N) technique for Rotary Position Embeddings. Further refinements include the implementation of qk layernorm, integrated across all Qwen3 models to enhance training stability and performance.

The model supports over 100 languages and dialects, providing multilingual processing capabilities. Its design also enables integration with external tools, facilitating agentic functionalities for addressing multi-step problems. These characteristics position Qwen3-14B as an adaptable asset for applications requiring analytical depth, such as advanced AI assistants, as well as interactive conversational systems.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


Other Qwen 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#9

BenchmarkScoreRank

0.74

🥉

3

0.68

6

0.73

12

0.58

13

Rankings

Overall Rank

#9

Coding Rank

#22

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
64k
128k

VRAM Required:

Recommended GPUs