Active Parameters
196.81B
Context Length
256K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
11 Feb 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
64
Key-Value Heads
8
Attention Head Dimension
128
Position Embedding
ROPE
RoPE Theta
-
Sliding Window Attention
Yes
Sliding Window Size
512
Normalization
RMS Normalization
Activation Function
GELU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
45
FFN Intermediate Size (Dense)
1,280
Multi-Token Prediction Heads
3
Tokenizer
Vocabulary Size
128,896
Mixture of Experts
Total Expert Parameters
11.0B
Number of Experts
288
Active Experts
8
Shared Experts
-
FFN Intermediate Size (per Expert)
1,280
Dense Layers Before MoE
-
Step 3.5 Flash is StepFun's most capable open-source foundation model. Engineered on a sparse Mixture of Experts architecture with 196B total parameters but only 11B active per token, it delivers frontier reasoning and agentic capabilities with exceptional efficiency. Features 256K context window, supports text and image input, and achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. Optimized for local deployment on consumer hardware including Mac Studio M4 Max and high-end GPUs. Powered by 3-way Multi-Token Prediction (MTP-3) for 100-350 tok/s generation throughput.
Step 3.5 is StepFun's flagship frontier reasoning model family. Built on sparse Mixture-of-Experts (MoE) architecture, Step 3.5 models deliver frontier-level intelligence for agentic, reasoning, and coding tasks. The Flash variant selectively activates only 11B of its 196B parameters per token, achieving the reasoning depth of top-tier proprietary models while maintaining exceptional efficiency. Features 256K context window, native function calling, and Multi-Token Prediction for high-throughput inference. Released under Apache 2.0 license.
Rank
#74
No evaluation benchmarks for Step 3.5 Flash available.
Overall Rank
#74
Coding Rank
-
Total Score
69
/ 100
Step 3.5 Flash exhibits high transparency in its architectural design and hardware requirements, providing clear distinctions between total and active parameters. While it excels with a permissive Apache 2.0 license and detailed technical specifications, it remains opaque regarding its specific training data composition and total compute resources. The model is a strong example of 'open weights' transparency, though it falls short of 'open science' standards due to undisclosed data sources.
Architectural Provenance
Step 3.5 Flash provides high-quality architectural documentation via a technical report and official GitHub repository. It details a 45-layer sparse Mixture-of-Experts (MoE) backbone (3 dense, 42 MoE layers) with 288 routed experts and 1 shared expert. Notable technical innovations like 3-way Multi-Token Prediction (MTP-3) and a 3:1 Sliding Window Attention (SWA) to full attention ratio are explicitly documented. While the base model lineage is clear within the StepFun family, the specific pre-training data volume and exact initialization weights from previous versions are less granularly disclosed.
Dataset Composition
Data transparency is a significant weakness. While the technical report mentions general categories (knowledge data, code, math) and synthetic data generation pipelines (arithmetic, coding test cases), it lacks a specific percentage breakdown of the training corpus. Official documentation from partners like NVIDIA explicitly lists the 'Training Data Collection' as 'Undisclosed'. The model card mentions 'Proprietary dataset' for the bulk of pre-training, which fails to meet high transparency standards for source verification.
Tokenizer Integrity
The tokenizer is highly transparent, with its vocabulary size (128,896 tokens) and snapshots publicly available on Hugging Face and ModelScope. The technical report details the tokenizer's alignment with the model's multi-lingual and agentic requirements. It is integrated into standard libraries like Transformers and vLLM, allowing for direct public verification of tokenization behavior and efficiency.
Parameter Density
The model is exemplary in its disclosure of parameter density. It clearly distinguishes between total parameters (196.81B) and active parameters (~11B per token). The breakdown includes the backbone (196B) and the MTP head (0.81B). The MoE configuration (288 routed + 1 shared expert, top-8 routing) is fully documented, preventing the common 'parameter inflation' confusion associated with sparse architectures.
Training Compute
Compute transparency is minimal. While the technical report mentions a 'Compute Cluster' and optimized deployment on hardware like NVIDIA DGX Spark, it does not disclose the total GPU/TPU hours, training duration, or the specific carbon footprint of the training run. This lack of environmental and resource disclosure is a standard industry gap but results in a low score under these strict guidelines.
Benchmark Reproducibility
The model provides scores for a wide array of modern benchmarks (SWE-bench Verified, Terminal-Bench 2.0, LiveCodeBench-v6) and includes some evaluation details in the technical report. However, while the training codebase (SteptronOss) is open-sourced, the full evaluation suite with exact prompts and reproduction scripts for all claimed scores is not yet fully integrated or verified by independent third parties on public leaderboards like Open LLM Leaderboard.
Identity Consistency
The model demonstrates strong identity consistency, correctly identifying itself as Step 3.5 Flash in official documentation and API responses. It maintains clear versioning (v1.0) and is transparent about its nature as an AI model and its specific focus on agentic and reasoning tasks. There are no documented cases of the model claiming to be a competitor's product.
License Clarity
The model uses the highly permissive Apache 2.0 license for both its code and weights, which is explicitly stated on GitHub, Hugging Face, and in the technical report. This allows for both commercial and non-commercial use with minimal restrictions. The clarity of the licensing terms is excellent, with no conflicting proprietary overrides found in the primary documentation.
Hardware Footprint
Hardware requirements are exceptionally well-documented. Official guides specify VRAM needs for different quantization levels (e.g., 111.5 GB for Int4 GGUF) and provide minimum/recommended hardware targets (Mac Studio M4 Max, NVIDIA DGX Spark). It also documents the impact of its hybrid SWA/Full attention on memory scaling, providing users with clear expectations for local deployment.
Versioning Drift
The model uses semantic versioning (v1.0) and maintains a basic changelog on GitHub. However, as a relatively new release (Feb 2026), there is limited historical data on how StepFun manages long-term model drift or silent updates. The 'WIP' status of parts of the open-source training codebase suggests that versioning infrastructure for downstream users is still maturing.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online