Parameters
4B
Context Length
32.768K
Modality
Text
Architecture
Dense
License
Apache 2.0
Release Date
29 Apr 2025
Knowledge Cutoff
Mar 2025
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
-
Number of Layers
40
Attention Heads
48
Key-Value Heads
8
Activation Function
Swish
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Qwen3-4B is a foundational large language model developed by Alibaba, forming a part of the comprehensive Qwen3 series. This model is engineered to facilitate advanced natural language processing tasks, encompassing both general-purpose conversational abilities and specialized reasoning. A distinguishing architectural characteristic of the Qwen3 series is its dual-mode operation, which enables dynamic switching between a 'thinking mode' for complex, multi-step logical reasoning and a 'non-thinking mode' for efficient, direct responses. This adaptability optimizes performance across diverse application scenarios, ranging from intricate problem-solving to rapid-fire dialogue.
Architecturally, Qwen3-4B is a dense transformer model with 4.0 billion parameters, comprising 36 layers. It employs Grouped Query Attention (GQA) with 32 attention heads for queries and 8 key-value heads, which contributes to its computational efficiency during inference while maintaining performance. The model incorporates Rotary Position Embeddings (RoPE) for handling sequence length, natively supporting a context length of up to 32,768 tokens. This context length can be extended to 131,072 tokens through YaRN (Yet another RoPE N-dimensional extension) scaling techniques. The activation function utilized within the model is SwiGLU, and normalization is applied using RMSNorm, further contributing to stable training and performance.
Qwen3-4B is intended for a range of applications requiring sophisticated language understanding and generation. Its capabilities extend to areas such as mathematical problem-solving, code generation, creative writing, and multi-turn dialogue systems. The model's design facilitates its integration into agentic workflows, enabling precise interaction with external tools. Furthermore, Qwen3-4B demonstrates robust multilingual support, processing information across more than 100 languages and dialects. This combination of architectural design, reasoning flexibility, and broad language coverage positions it as a suitable candidate for a variety of academic and commercial deployments.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
Ranking is for Local LLMs.
No evaluation benchmarks for Qwen3-4B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens