ApX logoApX logo

Hunyuan Standard

Active Parameters

52B

Context Length

30K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Tencent Hunyuan Community License Agreement

Release Date

10 Jun 2024

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

80

Key-Value Heads

8

Attention Head Dimension

-

Position Embedding

Absolute Position Embedding

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

-

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

6,400

Number of Layers

64

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Mixture of Experts

Total Expert Parameters

389.0B

Number of Experts

17

Active Experts

2

Shared Experts

-

FFN Intermediate Size (per Expert)

-

Dense Layers Before MoE

-

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 6.4k · Context: 30kx 64 layersNormPre-AttentionMulti-Head Attention80Q / 8KV headsHead dim: 80+NormPre-FFNSparse MoE FFN (2/17 experts)SwiGLU+Final NormOutput Logits

Hunyuan Standard

Tencent Hunyuan-Large, identified as Hunyuan-MoE-A52B, is a large Transformer-based Mixture-of-Experts (MoE) model developed and open-sourced by Tencent. This model addresses the computational challenges associated with extensive parameter counts in large language models by employing a dynamic routing strategy. It is engineered to deliver high performance across a spectrum of natural language processing tasks, while optimizing resource utilization through its sparse activation mechanism. The model's design facilitates its application in diverse intelligent systems, supporting advancements in AI research and deployment .

The technical architecture of Hunyuan-Large incorporates a total of 389 billion parameters, with only 52 billion parameters actively utilized during inference, a characteristic of its Mixture-of-Experts design . The model structure includes one shared expert and 16 specialized experts, with one specialized expert activated per token, in addition to the continuously active shared expert . Positional encoding is managed using Rotary Position Embedding (RoPE), and the activation function is SwiGLU . To enhance inference efficiency and mitigate the memory footprint of the KV cache, Hunyuan-Large integrates Grouped-Query Attention (GQA) and Cross-Layer Attention (CLA), leading to a substantial reduction in KV cache memory consumption . The training regimen also benefits from high-quality synthetic data, an expert-specific learning rate scaling methodology, and the integration of Flash Attention for accelerated training processes .

Hunyuan-Large supports an extensive context window of up to 256,000 tokens in its pre-trained variant, enabling the processing and comprehension of lengthy textual inputs for applications such as detailed document analysis and extensive codebases . The model has demonstrated competitive performance across various benchmarks in both English and Chinese, including MMLU, MMLU-Pro, CMMLU, GSM8K, and MATH datasets, frequently exceeding the performance of dense models and other MoE models with comparable active parameter sizes . These capabilities position Hunyuan-Large as a suitable solution for demanding tasks requiring advanced reasoning, comprehensive content generation, and sophisticated understanding of long-form text .

About Hunyuan

Tencent Hunyuan large language models with various capabilities.


Other Hunyuan Models

Evaluation Benchmarks

Rank

#99

BenchmarkScoreRank

Web Development

WebDev Arena

1312

58

Rankings

Overall Rank

#99

Coding Rank

#71

Model Integrity

Total Score

B

66 / 100

Hunyuan Standard Model Integrity Report

Total Score

66

/ 100

B

Audit Note

Hunyuan-Large demonstrates strong transparency in its architectural design and parameter density, providing clear distinctions between total and active parameters in its MoE structure. However, it suffers from significant opacity regarding training compute resources and the specific origins of its multi-trillion token natural dataset. While the release of weights and code is a positive step, the restrictive community license and lack of environmental impact data limit its overall transparency profile.

Upstream

22.0 / 30

Architectural Provenance

8.0 / 10

The model architecture is extensively documented in a technical report (arXiv:2411.02265). It specifies a Transformer-based Mixture-of-Experts (MoE) design with 389B total and 52B active parameters. Key architectural details are disclosed, including the use of Rotary Position Embedding (RoPE), SwiGLU activation, Grouped-Query Attention (GQA), and a novel Cross-Layer Attention (CLA) mechanism. The report also details the expert configuration (1 shared expert and 16 specialized experts, with 1 activated per token) and the routing strategy. While the training methodology is described, the specific 'from scratch' vs. 'continued pre-training' lineage of the base architecture (beyond being a standard Transformer) is clear but lacks some historical development context.

Dataset Composition

5.0 / 10

Tencent discloses that the model was pre-trained on 7 trillion tokens, including a significant portion (1.5 trillion tokens) of synthetic data. The report mentions general categories such as web text, financial reports, legal documents, and academic papers. However, it lacks a precise percentage breakdown of the dataset composition (e.g., exact ratios of code vs. web vs. books) and does not provide specific names or links to the 5.5 trillion tokens of natural data sources, citing them as 'unspecified sources' in third-party analysis. The data cleaning and filtering pipeline is mentioned but not documented with reproducible detail.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available via the official GitHub and Hugging Face repositories. It uses a 128K vocabulary size, combining 100K tokens from the tiktoken (OpenAI) base with an additional 28K-29K tokens specifically trained on high-quality Chinese data to improve compression. The documentation provides specific compression rate comparisons (3.13 characters per token vs. Llama 3.1's 2.78), and the tokenizer is fully inspectable through the released code.

Model

26.0 / 40

Parameter Density

9.0 / 10

The model provides exemplary transparency regarding its parameter density. It clearly distinguishes between the 389 billion total parameters and the 52 billion active parameters used during inference. The MoE structure is well-defined, specifying the number of experts (16 specialized + 1 shared) and the activation frequency. This prevents the common 'parameter inflation' marketing trap seen in other MoE models.

Training Compute

2.0 / 10

Information regarding training compute is extremely limited. While Tencent mentions using their 'Xingmai' high-performance network and NVIDIA H20 GPUs (due to export restrictions), they do not disclose the total GPU/TPU hours, the specific training duration, the total energy consumption, or the carbon footprint. Most compute-related information comes from general corporate announcements about their infrastructure rather than model-specific technical disclosures.

Benchmark Reproducibility

6.0 / 10

The technical report provides results for dozens of standard benchmarks (MMLU, GSM8K, MATH, etc.) and specifies the versions used (e.g., MMLU-Pro). Evaluation code for some specialized benchmarks like AutoCodeBench is public. However, the exact prompts and few-shot configurations for all standard benchmarks are not fully detailed in a single reproducible repository, and third-party verification is currently limited to leaderboard positions rather than independent reproduction of the full training-to-eval pipeline.

Identity Consistency

9.0 / 10

The model consistently identifies as 'Hunyuan-Large' or 'Hunyuan-MoE-A52B' across all official documentation, weights, and code repositories. There is no evidence of identity confusion or claims of being a competitor's model. Versioning between 'Pretrain' and 'Instruct' variants is clearly maintained.

Downstream

18.0 / 30

License Clarity

6.0 / 10

The model is released under the 'Tencent Hunyuan Community License Agreement.' While it allows for free use and distribution for many, it contains significant restrictions: it is not valid in the European Union, and it requires a separate license request for entities with over 100 million monthly active users. These geographic and scale-based restrictions mean it does not meet the criteria for a standard open-source license (like Apache 2.0), creating some legal complexity for global users.

Hardware Footprint

7.0 / 10

Hardware requirements are documented in the GitHub repository and technical report. It provides VRAM guidance for fine-tuning (e.g., 32 GPUs for full fine-tuning, 8 for LoRA) and highlights the memory-saving benefits of CLA and FP8 quantization (50% reduction). While it lacks a comprehensive VRAM-to-context-length scaling table for all quantization levels (Q4, Q8, etc.), the provided guidance for H20/H100 clusters is specific and verifiable.

Versioning Drift

5.0 / 10

The model uses basic versioning (e.g., Hunyuan-A52B-Instruct-FP8), and updates are posted on Hugging Face and GitHub. However, there is no formal semantic versioning system or a detailed public changelog documenting specific weight updates or behavioral drift over time. The project is relatively new (late 2024), so long-term tracking of silent degradation is not yet possible, but the current infrastructure for version history is minimal.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
15k
29k

VRAM Required:

Recommended GPUs

Hunyuan Standard: Specifications and GPU VRAM Requirements