Hunyuan Standard: Specifications and GPU VRAM Requirements

Hunyuan Standard

Open Source

Open Weights

Active Parameters

52B

Context Length

30K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Tencent Hunyuan Community License Agreement

Release Date

10 Jun 2024

Knowledge Cutoff

Technical Specifications

Total Expert Parameters

389.0B

Number of Experts

Active Experts

Attention Structure

Multi-Head Attention

Hidden Dimension Size

6400

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Hunyuan Standard

Tencent Hunyuan-Large, identified as Hunyuan-MoE-A52B, is a large Transformer-based Mixture-of-Experts (MoE) model developed and open-sourced by Tencent. This model addresses the computational challenges associated with extensive parameter counts in large language models by employing a dynamic routing strategy. It is engineered to deliver high performance across a spectrum of natural language processing tasks, while optimizing resource utilization through its sparse activation mechanism. The model's design facilitates its application in diverse intelligent systems, supporting advancements in AI research and deployment .

The technical architecture of Hunyuan-Large incorporates a total of 389 billion parameters, with only 52 billion parameters actively utilized during inference, a characteristic of its Mixture-of-Experts design . The model structure includes one shared expert and 16 specialized experts, with one specialized expert activated per token, in addition to the continuously active shared expert . Positional encoding is managed using Rotary Position Embedding (RoPE), and the activation function is SwiGLU . To enhance inference efficiency and mitigate the memory footprint of the KV cache, Hunyuan-Large integrates Grouped-Query Attention (GQA) and Cross-Layer Attention (CLA), leading to a substantial reduction in KV cache memory consumption . The training regimen also benefits from high-quality synthetic data, an expert-specific learning rate scaling methodology, and the integration of Flash Attention for accelerated training processes .

Hunyuan-Large supports an extensive context window of up to 256,000 tokens in its pre-trained variant, enabling the processing and comprehension of lengthy textual inputs for applications such as detailed document analysis and extensive codebases . The model has demonstrated competitive performance across various benchmarks in both English and Chinese, including MMLU, MMLU-Pro, CMMLU, GSM8K, and MATH datasets, frequently exceeding the performance of dense models and other MoE models with comparable active parameter sizes . These capabilities position Hunyuan-Large as a suitable solution for demanding tasks requiring advanced reasoning, comprehensive content generation, and sophisticated understanding of long-form text .

About Hunyuan

Tencent Hunyuan large language models with various capabilities.

Other Hunyuan Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Hunyuan Standard available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

15k

29k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code