Qwen3-1.7B: Specifications and GPU VRAM Requirements

Qwen3-1.7B

闭源

开放权重

参数

1.7B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

29 Apr 2025

训练数据截止日期

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Qwen3-1.7B

Qwen3-1.7B is a dense causal language model developed by Alibaba's Qwen Team, introduced as part of the Qwen3 series on April 29, 2025. This model is designed for general-purpose language tasks and is characterized by its compact 1.7 billion parameter count. Its architecture is optimized for efficient operation across various hardware configurations, including environments with limited resources and edge devices. The model supports a context length of 32,768 tokens, enabling it to process extensive documents and conversations.

A distinguishing architectural feature within the Qwen3 series, including the 1.7B variant, is its dual operational modes: "Thinking Mode" and "Non-Thinking Mode." The Thinking Mode facilitates complex logical reasoning, such as mathematical problem-solving and code generation, through a step-by-step reasoning process. In contrast, the Non-Thinking Mode provides rapid, direct responses suitable for general conversational applications. This hybrid approach enables dynamic switching between modes, optimizing performance based on task complexity and efficiency requirements.

The model's architecture consists of 28 transformer layers, employing Grouped Query Attention (GQA) with 16 query heads and 8 key-value heads. It integrates Rotary Positional Embeddings (RoPE), specifically enhanced with ABF-RoPE, to maintain positional information accuracy across its extended context length. Further architectural refinements include the implementation of qk layernorm and RMSNorm with pre-normalization for stable training. Qwen3-1.7B demonstrates robust multilingual support, processing over 100 languages and dialects, and features advanced agent capabilities for tool integration.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

其他 Qwen 3 模型

评估基准

排名适用于本地LLM。

没有可用的 Qwen3-1.7B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档发布说明阅读论文下载权重

Qwen3-1.7B

技术规格

系统要求

Qwen3-1.7B

关于 Qwen 3

其他 Qwen 3 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源