ApX logoApX logo

Step 3.5 Flash

Active Parameters

196.81B

Context Length

256K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

11 Feb 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

64

Key-Value Heads

8

Attention Head Dimension

128

Position Embedding

ROPE

RoPE Theta

-

Sliding Window Attention

Yes

Sliding Window Size

512

Normalization

RMS Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

45

FFN Intermediate Size (Dense)

1,280

Multi-Token Prediction Heads

3

Tokenizer

Vocabulary Size

128,896

Mixture of Experts

Total Expert Parameters

11.0B

Number of Experts

288

Active Experts

8

Shared Experts

-

FFN Intermediate Size (per Expert)

1,280

Dense Layers Before MoE

-

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 4.1k · Context: 256k · Vocab: 128.9kx 45 layersRMSNormPre-AttentionGrouped-Query Attention64Q / 8KV heads · SW: 512Head dim: 128+RMSNormPre-FFNSparse MoE FFN (8/288 experts)GELUIntermediate: 1.3k+Final RMSNormOutput Logits

Step 3.5 Flash

Step 3.5 Flash is StepFun's most capable open-source foundation model. Engineered on a sparse Mixture of Experts architecture with 196B total parameters but only 11B active per token, it delivers frontier reasoning and agentic capabilities with exceptional efficiency. Features 256K context window, supports text and image input, and achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0. Optimized for local deployment on consumer hardware including Mac Studio M4 Max and high-end GPUs. Powered by 3-way Multi-Token Prediction (MTP-3) for 100-350 tok/s generation throughput.

About Step 3.5

Step 3.5 is StepFun's flagship frontier reasoning model family. Built on sparse Mixture-of-Experts (MoE) architecture, Step 3.5 models deliver frontier-level intelligence for agentic, reasoning, and coding tasks. The Flash variant selectively activates only 11B of its 196B parameters per token, achieving the reasoning depth of top-tier proprietary models while maintaining exceptional efficiency. Features 256K context window, native function calling, and Multi-Token Prediction for high-throughput inference. Released under Apache 2.0 license.


Other Step 3.5 Models
  • No related models available

Evaluation Benchmarks

Rank

#74

No evaluation benchmarks for Step 3.5 Flash available.

Rankings

Overall Rank

#74

Coding Rank

-

Model Integrity

Total Score

B

69 / 100

Step 3.5 Flash Model Integrity Report

Total Score

69

/ 100

B

Audit Note

Step 3.5 Flash exhibits high transparency in its architectural design and hardware requirements, providing clear distinctions between total and active parameters. While it excels with a permissive Apache 2.0 license and detailed technical specifications, it remains opaque regarding its specific training data composition and total compute resources. The model is a strong example of 'open weights' transparency, though it falls short of 'open science' standards due to undisclosed data sources.

Upstream

20.5 / 30

Architectural Provenance

8.0 / 10

Step 3.5 Flash provides high-quality architectural documentation via a technical report and official GitHub repository. It details a 45-layer sparse Mixture-of-Experts (MoE) backbone (3 dense, 42 MoE layers) with 288 routed experts and 1 shared expert. Notable technical innovations like 3-way Multi-Token Prediction (MTP-3) and a 3:1 Sliding Window Attention (SWA) to full attention ratio are explicitly documented. While the base model lineage is clear within the StepFun family, the specific pre-training data volume and exact initialization weights from previous versions are less granularly disclosed.

Dataset Composition

3.5 / 10

Data transparency is a significant weakness. While the technical report mentions general categories (knowledge data, code, math) and synthetic data generation pipelines (arithmetic, coding test cases), it lacks a specific percentage breakdown of the training corpus. Official documentation from partners like NVIDIA explicitly lists the 'Training Data Collection' as 'Undisclosed'. The model card mentions 'Proprietary dataset' for the bulk of pre-training, which fails to meet high transparency standards for source verification.

Tokenizer Integrity

9.0 / 10

The tokenizer is highly transparent, with its vocabulary size (128,896 tokens) and snapshots publicly available on Hugging Face and ModelScope. The technical report details the tokenizer's alignment with the model's multi-lingual and agentic requirements. It is integrated into standard libraries like Transformers and vLLM, allowing for direct public verification of tokenization behavior and efficiency.

Model

25.5 / 40

Parameter Density

8.5 / 10

The model is exemplary in its disclosure of parameter density. It clearly distinguishes between total parameters (196.81B) and active parameters (~11B per token). The breakdown includes the backbone (196B) and the MTP head (0.81B). The MoE configuration (288 routed + 1 shared expert, top-8 routing) is fully documented, preventing the common 'parameter inflation' confusion associated with sparse architectures.

Training Compute

2.0 / 10

Compute transparency is minimal. While the technical report mentions a 'Compute Cluster' and optimized deployment on hardware like NVIDIA DGX Spark, it does not disclose the total GPU/TPU hours, training duration, or the specific carbon footprint of the training run. This lack of environmental and resource disclosure is a standard industry gap but results in a low score under these strict guidelines.

Benchmark Reproducibility

6.0 / 10

The model provides scores for a wide array of modern benchmarks (SWE-bench Verified, Terminal-Bench 2.0, LiveCodeBench-v6) and includes some evaluation details in the technical report. However, while the training codebase (SteptronOss) is open-sourced, the full evaluation suite with exact prompts and reproduction scripts for all claimed scores is not yet fully integrated or verified by independent third parties on public leaderboards like Open LLM Leaderboard.

Identity Consistency

9.0 / 10

The model demonstrates strong identity consistency, correctly identifying itself as Step 3.5 Flash in official documentation and API responses. It maintains clear versioning (v1.0) and is transparent about its nature as an AI model and its specific focus on agentic and reasoning tasks. There are no documented cases of the model claiming to be a competitor's product.

Downstream

23.0 / 30

License Clarity

9.5 / 10

The model uses the highly permissive Apache 2.0 license for both its code and weights, which is explicitly stated on GitHub, Hugging Face, and in the technical report. This allows for both commercial and non-commercial use with minimal restrictions. The clarity of the licensing terms is excellent, with no conflicting proprietary overrides found in the primary documentation.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. Official guides specify VRAM needs for different quantization levels (e.g., 111.5 GB for Int4 GGUF) and provide minimum/recommended hardware targets (Mac Studio M4 Max, NVIDIA DGX Spark). It also documents the impact of its hybrid SWA/Full attention on memory scaling, providing users with clear expectations for local deployment.

Versioning Drift

5.0 / 10

The model uses semantic versioning (v1.0) and maintains a basic changelog on GitHub. However, as a relatively new release (Feb 2026), there is limited historical data on how StepFun manages long-term model drift or silent updates. The 'WIP' status of parts of the open-source training codebase suggests that versioning infrastructure for downstream users is still maturing.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs