Active Parameters
35B
Context Length
262.144K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
15 Apr 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
16
Key-Value Heads
2
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
10,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
2,048
Number of Layers
40
FFN Intermediate Size (Dense)
512
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
248,320
Mixture of Experts
Total Expert Parameters
3.0B
Number of Experts
256
Active Experts
9
Shared Experts
-
FFN Intermediate Size (per Expert)
512
Dense Layers Before MoE
-
Qwen3.6-35B-A3B is Alibaba's open-source hybrid MoE model with 35B total parameters and only 3B active per token. Built on a novel architecture combining Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared active), it delivers exceptional agentic coding performance rivaling much larger dense models. Achieves 73.4% on SWE-bench Verified, 51.5% on Terminal-Bench 2.0, and 92.6% on AIME 2026. Natively multimodal (text, image, video), supports 262K context natively (up to 1M with YaRN), includes thinking preservation for agentic tasks, and is trained with Multi-Token Prediction. Available via Alibaba Cloud Model Studio API as qwen3.6-flash. Released April 15, 2026 under Apache 2.0.
Qwen 3.6 is Alibaba's latest generation of hybrid sparse Mixture-of-Experts (MoE) models featuring a novel architecture that combines Gated DeltaNet linear attention layers with standard Gated Attention layers and MoE feed-forward networks. The family delivers substantial improvements in agentic coding, multimodal perception, and reasoning, with native support for thinking and non-thinking modes, thinking preservation across turns, and a 262K native context window.
Rank
#43
| Benchmark | Score | Rank |
|---|---|---|
Reasoning LiveBench Reasoning | 0.76 | 23 |
Overall Rank
#43
Coding Rank
-
Total Score
70
/ 100
Qwen3.6-35B-A3B demonstrates strong transparency in its architectural design and licensing, providing clear distinctions between total and active parameters. While it offers detailed hardware guidance for local deployment, it remains significantly opaque regarding its specific training data sources and the total compute resources consumed during development. The model's identity and versioning are well-maintained, though benchmark reproducibility is limited by the lack of a fully public evaluation suite.
Architectural Provenance
The model's architecture is extensively documented in official release blogs and model cards. It utilizes a hybrid design combining Gated DeltaNet (linear attention) with standard Gated Attention and a sparse Mixture-of-Experts (MoE) framework. Specific details such as the number of layers (40), hidden dimensions (2048), and the expert routing mechanism (256 experts, 8 routed + 1 shared) are publicly available. While the pre-training methodology is described as a multi-stage process (General, Reasoning, Long-context), the specific architectural modifications from the base Transformer are well-defined.
Dataset Composition
Information regarding the training data is limited to high-level descriptions. The provider mentions a 36-trillion token corpus for the Qwen3 series, including web data, books, and synthetic code/math, but lacks a detailed percentage breakdown or specific source list for the 3.6-35B-A3B variant. While filtering and cleaning processes are mentioned generally, the lack of granular composition data or public access to the training set limits transparency.
Tokenizer Integrity
The tokenizer is publicly accessible via the Hugging Face repository and is based on the Qwen tokenizer (BBPE) with a stated vocabulary size of 151,669 (padded to 248,320 in newer versions). It supports 201 languages, and its implementation is verifiable through standard libraries like Transformers and vLLM. The alignment between the tokenizer's training data and its claimed language support is well-documented.
Parameter Density
Transparency regarding parameter density is exemplary. The provider explicitly distinguishes between the 35.0B total parameters and the 3.0B active parameters per token. Detailed architectural breakdowns, including the number of experts (256) and the specific routing logic (8+1), are provided, preventing the common MoE pitfall of misleading parameter claims.
Training Compute
There is almost no verifiable information regarding the specific compute resources used to train this model. While the hardware type (GPUs/TPUs) can be inferred from the provider's scale, the actual GPU hours, carbon footprint, and total training cost are not disclosed. The documentation relies on vague statements about 'significant resources' and 'powerful infrastructure'.
Benchmark Reproducibility
While the model provides scores for standard benchmarks (SWE-bench, AIME, MMLU-Pro), the full evaluation code and exact prompts used for all reported results are not consistently public. Third-party verification is available through community leaderboards, but the lack of a comprehensive, reproducible evaluation suite directly from the provider prevents a higher score.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as a Qwen model and providing version-specific information (3.6-35B-A3B). It is transparent about its capabilities as a multimodal MoE model and its limitations regarding context window and 'thinking' modes. No significant instances of identity confusion or misrepresentation were found in official documentation.
License Clarity
The model is released under the Apache 2.0 license, which is a clear, permissive, and industry-standard open-source license. Commercial use, modification, and distribution are explicitly permitted without conflicting terms or hidden restrictions in the model card or repository.
Hardware Footprint
Hardware requirements are well-documented for various quantization levels (FP16, Q8, Q4). Specific VRAM estimates (e.g., ~20GB for Q4_K_M) and recommended hardware (RTX 3090/4090) are provided. The impact of the hybrid architecture on KV-cache efficiency and context scaling is also detailed, offering clear guidance for local deployment.
Versioning Drift
The model follows a clear versioning scheme (3.5 to 3.6) with documented changelogs in blog posts. However, the frequency of silent updates to the hosted API version (qwen3.6-flash) and the lack of a formal deprecation path for older weights in the open-source repository suggest moderate transparency in tracking long-term drift.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online