Parameters
27B
Context Length
262K
Modality
Multimodal
Architecture
Dense
License
Apache 2.0
Release Date
24 Feb 2026
Knowledge Cutoff
-
VRAM requirements for different quantization methods and context sizes
1,024 tokens
Consumer
3x RTX 4090
24GB VRAM
Datacenter
1x NVIDIA A100
80GB VRAM
Apple Silicon
1x Apple M3 Max
128GB VRAM
262,144 tokens
Consumer
7x RTX 4090
24GB VRAM
Datacenter
2x NVIDIA A100
80GB VRAM
Apple Silicon
2x Apple M3 Max
128GB VRAM
Rank
#53
| Benchmark | Score | Rank |
|---|---|---|
General Text Text Arena | 1409 | 50 |
Web Development WebDev Arena | 1357 | 56 |
Overall Rank
#53
Coding Rank
#65
Qwen3.5-27B is Alibaba Cloud's dense multimodal foundation model with 27B parameters, released February 2026. Unlike the MoE variants, it uses a dense architecture combining Gated Delta Networks and Feed Forward Networks. It achieves MMLU-Pro (86.1%), GPQA Diamond (85.5%), SWE-bench Verified (72.4%), and Terminal-Bench 2.0 (41.6%). Features unified vision-language capabilities, 262k native context (extensible to 1M), and excels across reasoning, coding, multimodal understanding, and multilingual tasks spanning 201 languages.
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
24
Key-Value Heads
4
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
10,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
5,120
Number of Layers
64
FFN Intermediate Size (Dense)
17,408
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
248,320
Total Score
69
/ 100
Qwen3.5-27B exhibits strong transparency in its architectural design and licensing, providing deep technical details on its hybrid attention mechanism and a permissive Apache 2.0 license. However, the model suffers from significant opacity regarding its training data composition and the total compute resources utilized during development. While hardware requirements and identity consistency are well-handled, the lack of a reproducible evaluation suite and granular dataset disclosure limits its overall transparency profile.
Architectural Provenance
The model architecture is extensively documented in official Hugging Face model cards and technical blog posts. It utilizes a novel hybrid design consisting of 64 layers organized into 16 groups, where each group contains three Gated DeltaNet (linear attention) layers and one Gated Attention layer. Detailed specifications including hidden dimensions (5120), head dimensions (128 for GDN, 256 for Gated Attention), and intermediate FFN dimensions (17408) are publicly available. While the high-level methodology of 'early-fusion' multimodal training is described, the specific pre-training recipe and architectural modifications for the vision encoder integration are less detailed than the language backbone.
Dataset Composition
Information regarding the training data remains largely high-level and lacks granular transparency. Official sources mention 'trillions of multimodal tokens' and a 'multilingual data annotation system' labeling over 30T tokens across 201 languages. However, there is no specific breakdown of dataset proportions (e.g., % web, % code, % books) or a comprehensive list of data sources. While some evaluation datasets like HLE-Verified are open-sourced, the primary pre-training corpus composition and specific filtering/cleaning methodologies are not disclosed in detail, relying on vague descriptors like 'carefully curated'.
Tokenizer Integrity
The tokenizer is publicly accessible via the Hugging Face repository and is well-documented. It uses a Byte-level Byte Pair Encoding (BPE) approach with a vocabulary size of 248,320 (padded), significantly expanded from previous generations to support 201 languages. Vocabulary size and tokenization logic are verified through both official documentation and third-party implementations (e.g., .NET ports). The alignment between claimed language support and tokenizer efficiency is documented, though some internal token normalization details are proprietary.
Parameter Density
As a dense model, parameter density is straightforward and clearly stated at 27.0B total parameters. Unlike the MoE variants in the Qwen 3.5 family, all parameters are active during inference. The architectural breakdown is highly detailed, specifying the exact number of layers (64), attention heads (24 Q, 4 KV), and the specific layout of Gated DeltaNet vs. Gated Attention blocks. This level of detail allows for precise calculation of computational requirements and memory overhead.
Training Compute
Transparency regarding training compute is extremely low. While the use of 'Next-Generation Training Infrastructure' and 'asynchronous RL frameworks' is mentioned in marketing materials, there is no public disclosure of the total GPU/TPU hours consumed, the specific hardware clusters used for the 27B variant's training, or the associated carbon footprint. The company cites efficiency gains but provides no verifiable metrics to back these claims, scoring poorly on environmental and resource transparency.
Benchmark Reproducibility
The model provides scores for standard benchmarks (MMLU-Pro: 86.1%, GPQA Diamond: 85.5%, SWE-bench Verified: 72.4%) which are verifiable through third-party leaderboards like Artificial Analysis and OpenRouter. However, the exact evaluation code, specific prompts, and few-shot examples used for official reporting are not fully public in a centralized repository. While some third-party audits exist, the lack of a comprehensive, reproducible evaluation suite from the provider prevents a higher score.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as Qwen 3.5 and maintaining version awareness across different deployment frameworks (vLLM, SGLang, Ollama). It is transparent about its multimodal capabilities and its position within the broader Qwen 3.5 ecosystem. There are no documented instances of the model claiming to be a competitor's product or misrepresenting its dense architecture as an MoE variant.
License Clarity
The model is released under the Apache 2.0 license, which is a standard, highly permissive open-source license. The license terms are clearly stated in the Hugging Face repository and official announcements, explicitly allowing for commercial use, modification, and redistribution. There are no conflicting custom terms or restrictive 'acceptable use' policies that override the base license, providing exemplary legal transparency.
Hardware Footprint
Hardware requirements are well-documented by both the provider and the community. VRAM requirements for various quantization levels (Q4, Q8, FP16) are publicly available, with specific guidance for single-GPU deployment (e.g., ~16-18GB for Q4 GGUF). Memory scaling for the 262k context window is documented, and third-party tools like Unsloth provide detailed VRAM calculators. However, official documentation on the specific accuracy-performance tradeoffs of the new 'Gated DeltaNet' layers under heavy quantization is still emerging.
Versioning Drift
The model uses a clear naming convention (Qwen3.5-27B), but the changelog and version history are somewhat fragmented across blog posts and GitHub commits. While major releases are announced, minor weight updates or 'silent' optimizations (such as the March 5 GGUF update) are often communicated through third-party partners rather than a centralized, formal versioning system. There is no clear public roadmap or deprecation policy for previous versions.
Qwen 3.5 is Alibaba Cloud's latest-generation foundation model family, released February 2026. It represents a significant leap forward, integrating breakthroughs in multimodal learning (unified vision-language foundation), efficient hybrid architecture (Gated Delta Networks with sparse Mixture-of-Experts), scalable reinforcement learning across million-agent environments, and global linguistic coverage spanning 201 languages. Available under Apache 2.0 license with open weights.
APX AI
Online