ApX logoApX logo

Qwen3.6 35B A3B

Active Parameters

35B

Context Length

262.144K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

15 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

16

Key-Value Heads

2

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

2,048

Number of Layers

40

FFN Intermediate Size (Dense)

512

Multi-Token Prediction Heads

1

Tokenizer

Vocabulary Size

248,320

Mixture of Experts

Total Expert Parameters

3.0B

Number of Experts

256

Active Experts

9

Shared Experts

-

FFN Intermediate Size (per Expert)

512

Dense Layers Before MoE

-

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 2k · Context: 262.1k · Vocab: 248.3kx 40 layersRMSNormPre-AttentionGrouped-Query Attention16Q / 2KV headsHead dim: 256+RMSNormPre-FFNSparse MoE FFN (9/256 experts)SwiGLUIntermediate: 512+Final RMSNormOutput Logits

Qwen3.6 35B A3B

Qwen3.6-35B-A3B is Alibaba's open-source hybrid MoE model with 35B total parameters and only 3B active per token. Built on a novel architecture combining Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared active), it delivers exceptional agentic coding performance rivaling much larger dense models. Achieves 73.4% on SWE-bench Verified, 51.5% on Terminal-Bench 2.0, and 92.6% on AIME 2026. Natively multimodal (text, image, video), supports 262K context natively (up to 1M with YaRN), includes thinking preservation for agentic tasks, and is trained with Multi-Token Prediction. Available via Alibaba Cloud Model Studio API as qwen3.6-flash. Released April 15, 2026 under Apache 2.0.

About Qwen 3.6

Qwen 3.6 is Alibaba's latest generation of hybrid sparse Mixture-of-Experts (MoE) models featuring a novel architecture that combines Gated DeltaNet linear attention layers with standard Gated Attention layers and MoE feed-forward networks. The family delivers substantial improvements in agentic coding, multimodal perception, and reasoning, with native support for thinking and non-thinking modes, thinking preservation across turns, and a 262K native context window.


Other Qwen 3.6 Models
  • No related models available

Evaluation Benchmarks

Rank

#43

BenchmarkScoreRank

0.76

23

Rankings

Overall Rank

#43

Coding Rank

-

Model Integrity

Total Score

B+

70 / 100

Qwen3.6 35B A3B Model Integrity Report

Total Score

70

/ 100

B+

Audit Note

Qwen3.6-35B-A3B demonstrates strong transparency in its architectural design and licensing, providing clear distinctions between total and active parameters. While it offers detailed hardware guidance for local deployment, it remains significantly opaque regarding its specific training data sources and the total compute resources consumed during development. The model's identity and versioning are well-maintained, though benchmark reproducibility is limited by the lack of a fully public evaluation suite.

Upstream

20.0 / 30

Architectural Provenance

8.0 / 10

The model's architecture is extensively documented in official release blogs and model cards. It utilizes a hybrid design combining Gated DeltaNet (linear attention) with standard Gated Attention and a sparse Mixture-of-Experts (MoE) framework. Specific details such as the number of layers (40), hidden dimensions (2048), and the expert routing mechanism (256 experts, 8 routed + 1 shared) are publicly available. While the pre-training methodology is described as a multi-stage process (General, Reasoning, Long-context), the specific architectural modifications from the base Transformer are well-defined.

Dataset Composition

3.0 / 10

Information regarding the training data is limited to high-level descriptions. The provider mentions a 36-trillion token corpus for the Qwen3 series, including web data, books, and synthetic code/math, but lacks a detailed percentage breakdown or specific source list for the 3.6-35B-A3B variant. While filtering and cleaning processes are mentioned generally, the lack of granular composition data or public access to the training set limits transparency.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly accessible via the Hugging Face repository and is based on the Qwen tokenizer (BBPE) with a stated vocabulary size of 151,669 (padded to 248,320 in newer versions). It supports 201 languages, and its implementation is verifiable through standard libraries like Transformers and vLLM. The alignment between the tokenizer's training data and its claimed language support is well-documented.

Model

25.5 / 40

Parameter Density

9.5 / 10

Transparency regarding parameter density is exemplary. The provider explicitly distinguishes between the 35.0B total parameters and the 3.0B active parameters per token. Detailed architectural breakdowns, including the number of experts (256) and the specific routing logic (8+1), are provided, preventing the common MoE pitfall of misleading parameter claims.

Training Compute

2.0 / 10

There is almost no verifiable information regarding the specific compute resources used to train this model. While the hardware type (GPUs/TPUs) can be inferred from the provider's scale, the actual GPU hours, carbon footprint, and total training cost are not disclosed. The documentation relies on vague statements about 'significant resources' and 'powerful infrastructure'.

Benchmark Reproducibility

5.0 / 10

While the model provides scores for standard benchmarks (SWE-bench, AIME, MMLU-Pro), the full evaluation code and exact prompts used for all reported results are not consistently public. Third-party verification is available through community leaderboards, but the lack of a comprehensive, reproducible evaluation suite directly from the provider prevents a higher score.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as a Qwen model and providing version-specific information (3.6-35B-A3B). It is transparent about its capabilities as a multimodal MoE model and its limitations regarding context window and 'thinking' modes. No significant instances of identity confusion or misrepresentation were found in official documentation.

Downstream

24.5 / 30

License Clarity

10.0 / 10

The model is released under the Apache 2.0 license, which is a clear, permissive, and industry-standard open-source license. Commercial use, modification, and distribution are explicitly permitted without conflicting terms or hidden restrictions in the model card or repository.

Hardware Footprint

8.5 / 10

Hardware requirements are well-documented for various quantization levels (FP16, Q8, Q4). Specific VRAM estimates (e.g., ~20GB for Q4_K_M) and recommended hardware (RTX 3090/4090) are provided. The impact of the hybrid architecture on KV-cache efficiency and context scaling is also detailed, offering clear guidance for local deployment.

Versioning Drift

6.0 / 10

The model follows a clear versioning scheme (3.5 to 3.6) with documented changelogs in blog posts. However, the frequency of silent updates to the hosted API version (qwen3.6-flash) and the lack of a formal deprecation path for older weights in the open-source repository suggest moderate transparency in tracking long-term drift.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
128k
256k

VRAM Required:

Recommended GPUs