Parameters
800M
Context Length
262.144K
Modality
Multimodal
Architecture
Dense
License
Apache 2.0
Release Date
24 Feb 2026
Knowledge Cutoff
-
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
1024
Number of Layers
24
Attention Heads
8
Key-Value Heads
2
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
Qwen3.5-0.8B is Alibaba Cloud's ultra-compact multimodal foundation model with 0.8B parameters, released February 2026. It uses a hybrid architecture combining Gated Delta Networks and Gated Attention in a 6×(3×DeltaNet→FFN→1×Attention→FFN) pattern. In thinking mode, it achieves MMLU-Pro (66.5%), GPQA Diamond (51.6%), and GPQA (11.9%). Features unified vision-language capabilities, 262k native context, multi-token prediction training, and supports both thinking and non-thinking modes, designed for prototyping, fine-tuning, and research purposes across 201 languages.
Qwen 3.5 is Alibaba Cloud's latest-generation foundation model family, released February 2026. It represents a significant leap forward, integrating breakthroughs in multimodal learning (unified vision-language foundation), efficient hybrid architecture (Gated Delta Networks with sparse Mixture-of-Experts), scalable reinforcement learning across million-agent environments, and global linguistic coverage spanning 201 languages. Available under Apache 2.0 license with open weights.
No evaluation benchmarks for Qwen3.5-0.8B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens