Qwen3.5-4B

Open Source

Open Weights

Parameters

Context Length

262K

Modality

Multimodal

Architecture

Dense

License

Apache 2.0

Release Date

24 Feb 2026

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

10.04 GB VRAM

Consumer

1x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

262,144 tokens

45.98 GB VRAM

Consumer

3x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

No evaluation benchmarks for Qwen3.5-4B available.

Rankings

Overall Rank

Coding Rank

About Qwen3.5-4B

Qwen3.5-4B is Alibaba Cloud's compact multimodal foundation model with 4B parameters, released February 2026. It uses a hybrid architecture combining Gated Delta Networks and Gated Attention in an 8×(3×DeltaNet→FFN→1×Attention→FFN) pattern. It achieves MMLU-Pro (79.1%), GPQA Diamond (76.2%), HMMT benchmarks (74%/77%), and strong vision-language scores. Features unified vision-language capabilities, 262k native context (extensible to 1M), multi-token prediction training, and delivers efficient performance across reasoning, coding, multimodal understanding, and multilingual tasks covering 201 languages.

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000,000

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Yes

Linear Attention Ratio

75.0%

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

2,560

Number of Layers

FFN Intermediate Size (Dense)

9,216

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

248,320

Model Integrity

Total Score

65 / 100

Upstream

20.0 / 30

Model

21.0 / 40

Downstream

24.0 / 30

Qwen3.5-4B Model Integrity Report

Total Score

/ 100

Audit Note

Qwen3.5-4B exhibits strong transparency in its architectural specifications and licensing, providing clear technical details on its hybrid attention mechanism and permissive open-source terms. However, it suffers from significant opacity regarding its training data composition and compute resources, which remain largely proprietary. While benchmark performance is high, the lack of reproducible evaluation artifacts and known data contamination issues necessitate a skeptical approach to its reported scores.

Upstream

20.0 / 30

Architectural Provenance

8.0 / 10

The model architecture is extensively documented on its official Hugging Face page and GitHub repository. It specifies a hybrid layout of 8 blocks, each containing 3 Gated DeltaNet layers followed by 1 Gated Attention layer, with detailed dimensions for hidden layers (2560), heads, and intermediate FFN (9216). While the training methodology (multi-token prediction and early fusion) is described, a formal peer-reviewed paper for the 3.5 series is not yet linked, though it references the Qwen3 technical report (arXiv:2505.09388) for foundational methods.

Dataset Composition

3.0 / 10

Transparency regarding the training data is low. While the provider mentions a 'trillions of tokens' multimodal corpus including web, code, and books, and specifies support for 201 languages, there is no public breakdown of dataset proportions, specific sources, or detailed filtering/cleaning methodologies. The documentation vaguely refers to 'high-quality data' and 'curated' sets without providing verifiable composition metrics.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available via the Hugging Face 'transformers' library and is well-documented. It uses a Byte Pair Encoding (BPE) approach with a large, padded vocabulary size of 248,320 tokens. The documentation explicitly lists control tokens for chat, vision, and tool use, and the vocabulary's efficiency across 201 languages is verifiable through the provided configuration files.

Model

21.0 / 40

Parameter Density

7.0 / 10

The model clearly states its total parameter count as 4.0 billion. As a dense variant within the Qwen 3.5 family, it avoids the ambiguity of active vs. total parameters found in its MoE counterparts. However, it lacks a detailed breakdown of parameter allocation between the vision encoder and the language backbone in the primary model card, though some layer-wise dimensions are provided.

Training Compute

1.0 / 10

There is virtually no public information regarding the compute resources used to train the 4B variant. No GPU/TPU hours, hardware cluster specifications, or carbon footprint data are disclosed. The documentation only mentions a 'Next-Generation Training Infrastructure' in marketing terms without providing verifiable technical metrics.

Benchmark Reproducibility

4.0 / 10

While the model provides a comprehensive list of scores across standard benchmarks (MMLU-Pro: 79.1%, GPQA Diamond: 76.2%), it lacks public evaluation code or the exact prompts/few-shot examples used to achieve these results. The reliance on 'Thinking mode' for certain benchmarks is mentioned but not fully documented for independent reproduction. Automatic penalties were applied due to documented concerns regarding benchmark contamination in the Qwen series (e.g., RandomCalculation and MATH-500 studies).

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying its version (Qwen 3.5) and its multimodal capabilities in official documentation and API responses. It clearly distinguishes itself from previous generations (Qwen 3) and other family variants (MoE vs. Dense).

Downstream

24.0 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. The terms are clearly stated on Hugging Face and GitHub, explicitly allowing for commercial use, modification, and distribution without conflicting proprietary restrictions.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented for various deployment scenarios. Official and third-party documentation provide VRAM estimates for FP16 (~10.6GB) and quantized versions (e.g., 4-bit requiring ~2-4GB). It also provides guidance on context length memory scaling, noting native support for 262K tokens and the impact of RoPE scaling.

Versioning Drift

6.0 / 10

The model follows a clear semantic versioning path (Qwen3.5-4B) and maintains a basic changelog on GitHub. However, the documentation of 'silent' updates or behavioral drift is limited, and while previous versions are accessible on Hugging Face, the detailed delta between minor iterations is not always transparently documented.

Resources

Official Documentation Download Weights

About Qwen 3.5

Qwen 3.5 is Alibaba Cloud's latest-generation foundation model family, released February 2026. It represents a significant leap forward, integrating breakthroughs in multimodal learning (unified vision-language foundation), efficient hybrid architecture (Gated Delta Networks with sparse Mixture-of-Experts), scalable reinforcement learning across million-agent environments, and global linguistic coverage spanning 201 languages. Available under Apache 2.0 license with open weights.

Qwen3.5-4B

System Requirements

Architecture Diagram

Evaluation Benchmarks

Rankings

About Qwen3.5-4B

Technical Specifications

Model Integrity

Qwen3.5-4B Model Integrity Report

Audit Note

Upstream

Model

Downstream

Resources

About Qwen 3.5

Other Qwen 3.5 Models