Parameters
52B
Context Length
32K
Modality
Text
Architecture
Dense
License
-
Release Date
16 Jul 2025
Knowledge Cutoff
Dec 2024
Attention Structure
Multi-Head Attention
Hidden Dimension Size
-
Number of Layers
128
Attention Heads
-
Key-Value Heads
-
Activation Function
-
Normalization
-
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Tencent Hunyuan TurboS represents a significant advancement in large language models, engineered to deliver both rapid response times and robust reasoning capabilities. This model integrates a dual cognitive approach, analogous to human "fast thinking," to enable near-instantaneous replies for a broad spectrum of queries. Its design prioritizes efficiency and responsiveness, making it suitable for applications that demand quick, high-quality interactions. The model effectively balances speed with the capacity to address complex informational and analytical tasks, supporting a flexible approach to problem-solving.
Architecturally, Hunyuan TurboS is a novel hybrid Transformer-Mamba Mixture of Experts (MoE) model. This innovative fusion combines the strengths of Mamba2 layers, which excel at efficient processing of long sequences and reduced KV-Cache memory footprint, with the Transformer's established capacity for deep contextual understanding. The model incorporates 128 layers, comprising 57 Mamba2 layers, 7 Attention layers, and 64 Feed-Forward Network (FFN) layers. The FFN layers specifically utilize an MoE structure with 32 experts, where each token activates 1 shared and 2 specialized experts, enhancing computational efficiency. Furthermore, the model employs Grouped-Query Attention (GQA) to optimize memory usage and computational overhead during inference.
Hunyuan TurboS is designed to handle extensive information, supporting an ultra-long context length of 256,000 tokens. This capability allows the model to maintain performance across lengthy documents and extended dialogues. Its post-training strategy includes supervised fine-tuning and adaptive long-short Chain-of-Thought (CoT) fusion, enabling dynamic switching between rapid responses for simple queries and more analytical, step-by-step processing for intricate problems. The model is deployed for various applications requiring efficient, high-performance AI, such as advanced conversational agents, content generation, and sophisticated analytical systems.
Tencent Hunyuan large language models with various capabilities.
Ranking is for Local LLMs.
No evaluation benchmarks for Hunyuan TurboS available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens