Parameters
52B
Context Length
32K
Modality
Text
Architecture
Dense
License
-
Release Date
16 Jul 2025
Knowledge Cutoff
Dec 2024
Attention
Attention Structure
Multi-Head Attention
Attention Heads
64
Key-Value Heads
8
Attention Head Dimension
-
Position Embedding
Absolute Position Embedding
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
5,120
Number of Layers
128
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
Tencent Hunyuan-TurboS is a high-performance large language model designed to optimize the trade-off between computational efficiency and complex reasoning. By integrating an adaptive long-short Chain-of-Thought (CoT) mechanism, the model dynamically adjusts its cognitive overhead, employing a rapid "fast-thinking" mode for intuitive queries and a more rigorous analytical mode for intricate tasks. This dual-path approach allows the model to deliver near-instantaneous responses for general interactions while maintaining the logical depth required for STEM, coding, and mathematical problem-solving.
Architecturally, Hunyuan-TurboS introduces a hybrid Transformer-Mamba2 Mixture of Experts (MoE) framework, representing an advancement in large-scale state-space model integration. The structure consists of 128 layers organized in an interleaved AMF (Attention-Mamba2-FFN) and MF (Mamba2-FFN) block pattern. This fusion leverages Mamba2 layers to achieve linear scaling for long sequences while utilizing Grouped-Query Attention (GQA) to minimize KV-Cache memory footprints. The model's Feed-Forward Networks (FFN) employ an MoE design with 32 experts, where each token activates a single shared expert and two specialized experts to maintain high capacity with optimized compute.
Built for enterprise-grade scalability, the model supports an ultra-long context window of 256,000 tokens and was pre-trained on a massive corpus of 16 trillion high-quality tokens. Its post-training regime includes supervised fine-tuning on 3 million instructions and a multi-stage reinforcement learning process focused on STEM accuracy and general instruction following. These characteristics make Hunyuan-TurboS well-suited for high-throughput applications such as real-time conversational agents, large-scale document analysis, and sophisticated reasoning tasks where latency and cost-efficiency are paramount.
Tencent Hunyuan large language models with various capabilities.
Rank
#31
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1383 | 31 |
Overall Rank
#31
Coding Rank
#43
Total Score
66
/ 100
Hunyuan-TurboS demonstrates impressive technical transparency regarding its hybrid architecture and parameter distribution, providing more detail than many proprietary peers. However, it remains opaque concerning training compute resources and specific dataset proportions. The restrictive, geographically-limited license and the lack of full evaluation code for its primary benchmarks represent significant barriers to independent verification and global adoption.
Architectural Provenance
Tencent provides a high level of architectural detail in the official technical report (arXiv:2505.23076). The model is explicitly described as a hybrid Transformer-Mamba2 Mixture of Experts (MoE) model. It specifies a 128-layer structure with a precise interleaved pattern: 57 Mamba2 layers, 7 Attention layers (using Grouped-Query Attention), and 64 FFN layers. The report details the 'AMF' (Attention-Mamba2-FFN) and 'MF' (Mamba2-FFN) block patterns, providing a level of transparency rarely seen in proprietary models.
Dataset Composition
The model was pre-trained on a massive corpus of 16 trillion tokens. While the total token count and the use of 3 million instructions for supervised fine-tuning are disclosed, the specific breakdown of the 16T tokens (e.g., percentage of web, code, books) is not provided in detail. The documentation mentions 'high-quality tokens' and 'STEM-specific data' but lacks a granular composition table or public access to data samples, which is common for large-scale industrial models.
Tokenizer Integrity
The tokenizer is well-documented and consistent with the Hunyuan-Large model. It features a vocabulary of 128K tokens, consisting of 100K tokens from the tiktoken (OpenAI) base and 28K additional tokens specifically optimized for Chinese language support. Technical metrics such as compression rates (3.13 characters per token) are publicly disclosed, and the tokenizer is accessible via the official GitHub repository for the Hunyuan family.
Parameter Density
Tencent is transparent about the model's scale, disclosing both total and active parameters. Hunyuan-TurboS has 560 billion total parameters with 56 billion active parameters per token. The MoE structure is detailed as having 32 experts, with a routing strategy that activates 1 shared expert and 2 specialized experts per token. This clear distinction between dense and sparse parameter counts prevents the common 'parameter inflation' marketing trap.
Training Compute
Information regarding training compute is limited to high-level infrastructure descriptions. While Tencent mentions its 'Xingmai' high-performance network and the ability to support clusters of over 100,000 GPUs, it does not disclose the specific GPU hours, hardware type (e.g., H100 vs. H800), or the carbon footprint associated with training Hunyuan-TurboS specifically. This lack of specific resource disclosure is a significant gap.
Benchmark Reproducibility
The model's performance is documented across 23 automated benchmarks with an average score of 77.9%. While the technical report lists scores for MMLU, GSM8K, and HumanEval, and the model is ranked on the LMSYS Chatbot Arena (#8 globally), the exact evaluation code and full prompt sets used for internal benchmarking are not fully public. However, the release of the 'C3-Bench' and 'ArtifactsBench' by the same team provides some reproducible evaluation frameworks for the broader community.
Identity Consistency
The model maintains a consistent identity as part of the Tencent Hunyuan family. It correctly identifies its version (e.g., Hunyuan-TurboS-20250416) and its specific 'fast-thinking' vs. 'slow-thinking' (Hunyuan-T1) capabilities. There are no documented cases of the model claiming to be a competitor's product or misrepresenting its origin during standard interactions.
License Clarity
The licensing situation is complex and restrictive. While some components are released under the 'Tencent Hunyuan Community License,' it contains significant geographic restrictions (e.g., not applicable in the EU, UK, or South Korea) and requires explicit permission for entities with over 100 million monthly active users. This is not a standard open-source license and creates ambiguity for global commercial use.
Hardware Footprint
Basic hardware requirements are available through Tencent Cloud documentation and community guides. The model's 56B active parameters suggest a high VRAM requirement (estimated ~112GB for FP16), and the use of GQA and Mamba2 layers is explicitly noted as a strategy to reduce KV-cache memory footprint. However, official documentation lacks a comprehensive quantization-to-accuracy tradeoff table for consumer-grade hardware.
Versioning Drift
Tencent uses date-based versioning (e.g., 20250416) and provides high-level changelogs during major updates (e.g., May 2025 upgrade). However, as a primarily API-driven model, silent updates and behavioral drift are difficult for users to track independently. There is no public, granular version history that allows users to pin specific weights for long-term consistency.
APX AI
Online