Parameters
70B
Context Length
32K
Modality
Text
Architecture
Dense
License
-
Release Date
22 Aug 2025
Knowledge Cutoff
Dec 2024
Attention Structure
Multi-Head Attention
Hidden Dimension Size
-
Number of Layers
128
Attention Heads
-
Key-Value Heads
-
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
Tencent Hunyuan T1 is a high-performance reasoning model engineered for deep analytical tasks, logical problem-solving, and advanced scientific inquiry. It serves as the primary 'slow-thinking' reasoning engine within the Hunyuan ecosystem, designed to compete with state-of-the-art models by prioritizing structured logic and long-form consistency. The model is built upon the TurboS base, which represents a significant architectural shift toward integrating state-space models into large-scale production environments for enhanced computational efficiency.
The technical foundation of Hunyuan T1 is a Hybrid-Transformer-Mamba Mixture of Experts (MoE) architecture. This design incorporates Transformer blocks for global contextual awareness alongside Mamba-2 state-space layers, which provide linear scaling and superior memory efficiency for sequence modeling. The model utilizes a total of 16 experts, with dynamic routing that activates a subset of approximately 52 billion parameters per token. This hybrid approach is specifically engineered to mitigate the quadratic complexity of traditional attention mechanisms, allowing the model to handle context lengths of up to 256,000 tokens while maintaining a decoding speed approximately twice as fast as comparable dense Transformer models.
Operationally, Hunyuan T1 is optimized through a post-training regimen that heavily emphasizes large-scale reinforcement learning, with over 96% of compute resources dedicated to this phase. It employs curriculum learning to incrementally scale reasoning complexity and uses Cross-Layer Attention (CLA) to further reduce memory overhead during inference. These innovations make it particularly well-suited for enterprise-level tasks such as complex code generation, mathematical theorem proving, and multi-step logical deduction where high precision and reduced context loss are paramount.
Tencent Hunyuan large language models with various capabilities.
Rank
#24
| Benchmark | Score | Rank |
|---|---|---|
Web Development WebDev Arena | 1387 | 21 |
Overall Rank
#24
Coding Rank
#30