ApX logoApX logo

Hunyuan T1

Parameters

70B

Context Length

32K

Modality

Text

Architecture

Dense

License

-

Release Date

22 Aug 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

-

Key-Value Heads

-

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

-

Number of Layers

128

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Hunyuan T1

Tencent Hunyuan T1 is a high-performance reasoning model engineered for deep analytical tasks, logical problem-solving, and advanced scientific inquiry. It serves as the primary 'slow-thinking' reasoning engine within the Hunyuan ecosystem, designed to compete with state-of-the-art models by prioritizing structured logic and long-form consistency. The model is built upon the TurboS base, which represents a significant architectural shift toward integrating state-space models into large-scale production environments for enhanced computational efficiency.

The technical foundation of Hunyuan T1 is a Hybrid-Transformer-Mamba Mixture of Experts (MoE) architecture. This design incorporates Transformer blocks for global contextual awareness alongside Mamba-2 state-space layers, which provide linear scaling and superior memory efficiency for sequence modeling. The model utilizes a total of 16 experts, with dynamic routing that activates a subset of approximately 52 billion parameters per token. This hybrid approach is specifically engineered to mitigate the quadratic complexity of traditional attention mechanisms, allowing the model to handle context lengths of up to 256,000 tokens while maintaining a decoding speed approximately twice as fast as comparable dense Transformer models.

Operationally, Hunyuan T1 is optimized through a post-training regimen that heavily emphasizes large-scale reinforcement learning, with over 96% of compute resources dedicated to this phase. It employs curriculum learning to incrementally scale reasoning complexity and uses Cross-Layer Attention (CLA) to further reduce memory overhead during inference. These innovations make it particularly well-suited for enterprise-level tasks such as complex code generation, mathematical theorem proving, and multi-step logical deduction where high precision and reduced context loss are paramount.

About Hunyuan

Tencent Hunyuan large language models with various capabilities.


Other Hunyuan Models

Evaluation Benchmarks

Rank

#30

BenchmarkScoreRank

Web Development

WebDev Arena

1387

27

Rankings

Overall Rank

#30

Coding Rank

#41

Model Integrity

Total Score

B-

62 / 100

Hunyuan T1 Model Integrity Report

Total Score

62

/ 100

B-

Audit Note

Hunyuan T1 exhibits a strong technical foundation with significant transparency regarding its hybrid MoE-Mamba architecture and parameter density. However, the profile is weakened by a restrictive custom license and a lack of granular detail concerning training data composition and total compute resources. While benchmark performance is well-documented, the reliance on proprietary evaluation frameworks limits full independent reproducibility.

Upstream

20.0 / 30

Architectural Provenance

7.5 / 10

Tencent provides a detailed technical description of the 'TurboS' base architecture, identifying it as a Hybrid-Transformer-Mamba Mixture of Experts (MoE) model. Documentation specifies the integration of Mamba-2 state-space layers with Transformer blocks to achieve linear scaling for its 256,000 token context window. While the high-level design is well-documented in technical reports and blog posts, specific layer-by-layer configurations and the exact 'TurboS' pre-training recipe remain partially proprietary.

Dataset Composition

4.5 / 10

Disclosure is limited to general categories such as mathematics, logic, coding, and PhD-level scientific problems. While Tencent mentions a 'curriculum learning' approach and the use of 'ground-truth feedback' for reinforcement learning, it lacks a granular percentage breakdown of the training corpus. There is no public access to sample data or a detailed audit of the filtering and cleaning methodologies used for the T1 variant specifically.

Tokenizer Integrity

8.0 / 10

The model utilizes a tokenizer based on the tiktoken framework with a vocabulary of approximately 129,000 tokens (100K standard plus 29K Chinese-specific tokens). Technical documentation provides compression rate comparisons (3.13 characters/token) and confirms alignment with the model's multilingual capabilities. The tokenizer code is accessible via official GitHub repositories for the Hunyuan family, allowing for verification of its implementation.

Model

25.5 / 40

Parameter Density

7.0 / 10

Tencent is transparent about the model's sparse architecture, disclosing a total parameter count of approximately 389 billion with 52 billion active parameters per token. The use of 16 specialized experts and 1 shared expert is clearly documented. However, the '70B' designation in some marketing materials can be slightly confusing given the 52B active parameter reality, though technical reports clarify this distinction.

Training Compute

3.5 / 10

Information is sparse regarding the total compute budget. While Tencent discloses that 96.7% of post-training compute was dedicated to reinforcement learning, it does not provide the absolute number of GPU/TPU hours, specific hardware cluster sizes, or the total carbon footprint. This lack of absolute metrics makes it impossible to independently verify the environmental impact or total resource investment.

Benchmark Reproducibility

6.0 / 10

Tencent provides scores for standard benchmarks (MMLU-Pro: 87.2, MATH-500: 96.2, GPQA-Diamond: 69.3) and has released 'AutoCodeBench' to the community. However, the exact prompts and full evaluation code for the T1 reasoning chains are not fully public, and some results rely on internal human evaluation datasets that cannot be independently verified.

Identity Consistency

9.0 / 10

The model demonstrates strong identity consistency, correctly identifying itself as part of the Hunyuan T1 family and distinguishing its 'slow-thinking' reasoning capabilities from the 'fast-thinking' TurboS base. There are no documented instances of the model claiming a competitor's identity or misrepresenting its origin during standard interactions.

Downstream

16.5 / 30

License Clarity

5.0 / 10

The model is governed by the 'Tencent Hunyuan Community License Agreement,' which is a custom license rather than a standard open-source license like Apache 2.0. It includes significant geographic restrictions (excluding the UK, EU, and South Korea) and specific terms regarding 'Model Derivatives' and commercial use. The lack of a standard OSI-approved license creates ambiguity for global developers.

Hardware Footprint

6.5 / 10

Tencent provides general guidance on VRAM requirements, noting that the model is optimized for efficiency and 2x faster decoding compared to dense transformers. Third-party documentation and community guides provide VRAM estimates for various quantization levels (e.g., FP8, INT4), but official, comprehensive hardware requirement tables for the full T1 reasoning deployment are not centrally documented in a single technical model card.

Versioning Drift

5.0 / 10

Tencent maintains a versioning history (e.g., T1-Preview to T1 Official), but the changelogs are primarily high-level marketing summaries rather than detailed technical diffs. There is no public mechanism to pin specific sub-versions or track subtle behavioral drift resulting from the continuous reinforcement learning updates mentioned in official communications.