Qwen3-8B: Specifications and GPU VRAM Requirements

Qwen3-8B

闭源

开放权重

参数

上下文长度

131.072K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

29 Apr 2025

训练数据截止日期

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

4096

层数

注意力头

键值头

激活函数

SwigLU

归一化

Layer Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen3-8B

Qwen3-8B is a dense causal language model developed by Alibaba, part of the broader Qwen3 series. It consists of approximately 8.2 billion parameters and is engineered for efficient performance across a spectrum of natural language processing tasks. A distinctive feature within the Qwen3 family is the integration of a "thinking" mode for complex logical reasoning, mathematics, and coding, alongside a "non-thinking" mode optimized for general-purpose dialogue. This design facilitates dynamic adaptation of the model's operational characteristics based on task demands without requiring a switch between distinct models.

The architectural foundation of Qwen3-8B is the decoder-only transformer, incorporating refinements such as qk layernorm for enhanced stability and leveraging Grouped Query Attention (GQA) to optimize inference speed and memory utilization by sharing Key/Value heads among multiple Query heads. Its training regimen is a three-stage process, involving extensive pre-training on over 36 trillion tokens across 119 languages to build broad language proficiency and general knowledge. This initial stage (S1) is followed by specific optimization for reasoning skills in a second stage (S2) by increasing the proportion of STEM, coding, and reasoning data, and long-context comprehension in a third stage by extending training sequence lengths up to 32,768 tokens natively. The context length can be further extended to 131,072 tokens via the YaRN method.

Qwen3-8B exhibits enhanced reasoning capabilities and superior human preference alignment, making it effective for applications requiring creative writing, role-playing, multi-turn dialogues, and precise instruction following. Furthermore, it includes agent capabilities, supporting integration with external tools for complex agent-based tasks. The model's comprehensive multilingual support extends to over 100 languages and dialects, facilitating multilingual instruction following and translation.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

没有可用的 Qwen3-8B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

64k

128k

所需显存:

资源

官方文档发布说明阅读论文下载权重

Qwen3-8B

技术规格

系统要求

Qwen3-8B

关于 Qwen 3

其他 Qwen 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源