
趋近智
活跃参数
52B
上下文长度
30K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Tencent Hunyuan Community License Agreement
发布日期
10 Jun 2024
知识截止
-
专家参数总数
389.0B
专家数量
17
活跃专家
2
注意力结构
Multi-Head Attention
隐藏维度大小
6400
层数
64
注意力头
80
键值头
8
激活函数
SwigLU
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Tencent Hunyuan-Large, identified as Hunyuan-MoE-A52B, is a large Transformer-based Mixture-of-Experts (MoE) model developed and open-sourced by Tencent. This model addresses the computational challenges associated with extensive parameter counts in large language models by employing a dynamic routing strategy. It is engineered to deliver high performance across a spectrum of natural language processing tasks, while optimizing resource utilization through its sparse activation mechanism. The model's design facilitates its application in diverse intelligent systems, supporting advancements in AI research and deployment .
The technical architecture of Hunyuan-Large incorporates a total of 389 billion parameters, with only 52 billion parameters actively utilized during inference, a characteristic of its Mixture-of-Experts design . The model structure includes one shared expert and 16 specialized experts, with one specialized expert activated per token, in addition to the continuously active shared expert . Positional encoding is managed using Rotary Position Embedding (RoPE), and the activation function is SwiGLU . To enhance inference efficiency and mitigate the memory footprint of the KV cache, Hunyuan-Large integrates Grouped-Query Attention (GQA) and Cross-Layer Attention (CLA), leading to a substantial reduction in KV cache memory consumption . The training regimen also benefits from high-quality synthetic data, an expert-specific learning rate scaling methodology, and the integration of Flash Attention for accelerated training processes .
Hunyuan-Large supports an extensive context window of up to 256,000 tokens in its pre-trained variant, enabling the processing and comprehension of lengthy textual inputs for applications such as detailed document analysis and extensive codebases . The model has demonstrated competitive performance across various benchmarks in both English and Chinese, including MMLU, MMLU-Pro, CMMLU, GSM8K, and MATH datasets, frequently exceeding the performance of dense models and other MoE models with comparable active parameter sizes . These capabilities position Hunyuan-Large as a suitable solution for demanding tasks requiring advanced reasoning, comprehensive content generation, and sophisticated understanding of long-form text .
Tencent Hunyuan large language models with various capabilities.
排名适用于本地LLM。
没有可用的 Hunyuan Standard 评估基准。