Qwen3-8B: Specifications and GPU VRAM Requirements

Qwen3-8B

Closed Source

Open Weights

Parameters

Context Length

131.072K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

29 Apr 2025

Knowledge Cutoff

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Layer Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Qwen3-8B

Qwen3-8B is a dense causal language model developed by Alibaba, part of the broader Qwen3 series. It consists of approximately 8.2 billion parameters and is engineered for efficient performance across a spectrum of natural language processing tasks. A distinctive feature within the Qwen3 family is the integration of a "thinking" mode for complex logical reasoning, mathematics, and coding, alongside a "non-thinking" mode optimized for general-purpose dialogue. This design facilitates dynamic adaptation of the model's operational characteristics based on task demands without requiring a switch between distinct models.

The architectural foundation of Qwen3-8B is the decoder-only transformer, incorporating refinements such as qk layernorm for enhanced stability and leveraging Grouped Query Attention (GQA) to optimize inference speed and memory utilization by sharing Key/Value heads among multiple Query heads. Its training regimen is a three-stage process, involving extensive pre-training on over 36 trillion tokens across 119 languages to build broad language proficiency and general knowledge. This initial stage (S1) is followed by specific optimization for reasoning skills in a second stage (S2) by increasing the proportion of STEM, coding, and reasoning data, and long-context comprehension in a third stage by extending training sequence lengths up to 32,768 tokens natively. The context length can be further extended to 131,072 tokens via the YaRN method.

Qwen3-8B exhibits enhanced reasoning capabilities and superior human preference alignment, making it effective for applications requiring creative writing, role-playing, multi-turn dialogues, and precise instruction following. Furthermore, it includes agent capabilities, supporting integration with external tools for complex agent-based tasks. The model's comprehensive multilingual support extends to over 100 languages and dialects, facilitating multilingual instruction following and translation.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

Other Qwen 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Qwen3-8B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

64k

128k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights