ApX 标志ApX 标志

趋近智

Qwen3-1.7B

参数

1.7B

上下文长度

32.768K

模态

Text

架构

Dense

许可证

Apache 2.0

发布日期

29 Apr 2025

训练数据截止日期

Dec 2024

技术规格

注意力结构

Grouped-Query Attention

隐藏维度大小

2048

层数

32

注意力头

32

键值头

8

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

Qwen3-1.7B

Qwen3-1.7B is a dense causal language model engineered by the Alibaba Qwen team as a high-efficiency solution for general-purpose language processing and reasoning. Introduced as part of the Qwen3 series on April 29, 2025, the model is designed to operate effectively across diverse hardware environments, including mobile devices and edge computing platforms. It supports a native context length of 32,768 tokens, which can be further extended using YaRN-based rotary embedding scaling techniques, enabling the processing of extensive documents and prolonged multi-turn interactions.

Technically, the model is built on a transformer architecture comprising 28 layers with a hidden dimension of 2048. It utilizes Grouped Query Attention (GQA) with 16 query heads and 8 key-value heads to reduce memory overhead during inference while maintaining high performance. The architecture incorporates advanced stabilization and optimization techniques, including RMSNorm with pre-normalization, SwiGLU activation functions, and the introduction of QK-Norm to enhance attention layer stability in long-context scenarios. Positional information is managed through Rotary Positional Embeddings (RoPE), specifically utilizing an Adjusted Base Frequency (ABF) approach to maintain accuracy over the model's large context window.

A primary innovation of the Qwen3-1.7B model is its native dual-mode operational capability, which allows it to function in both Thinking and Non-Thinking modes within a single weight set. Thinking mode activates a step-by-step reasoning process, making the model suitable for complex logical deduction, mathematical problem-solving, and code generation. Non-Thinking mode provides direct, high-speed responses for standard conversational applications. This hybrid system supports dynamic switching via user directives or API parameters, allowing developers to allocate a computational thinking budget that balances output quality with inference latency.

关于 Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.


其他 Qwen 3 模型

评估基准

没有可用的 Qwen3-1.7B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
16k
32k

所需显存:

推荐 GPU