Hunyuan Standard: Specifications and GPU VRAM Requirements

Hunyuan Standard

开源

开放权重

活跃参数

52B

上下文长度

30K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Tencent Hunyuan Community License Agreement

发布日期

10 Jun 2024

训练数据截止日期

技术规格

专家参数总数

389.0B

专家数量

活跃专家

注意力结构

Multi-Head Attention

隐藏维度大小

6400

层数

注意力头

键值头

激活函数

SwigLU

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Hunyuan Standard

Tencent Hunyuan-Large, identified as Hunyuan-MoE-A52B, is a large Transformer-based Mixture-of-Experts (MoE) model developed and open-sourced by Tencent. This model addresses the computational challenges associated with extensive parameter counts in large language models by employing a dynamic routing strategy. It is engineered to deliver high performance across a spectrum of natural language processing tasks, while optimizing resource utilization through its sparse activation mechanism. The model's design facilitates its application in diverse intelligent systems, supporting advancements in AI research and deployment .

The technical architecture of Hunyuan-Large incorporates a total of 389 billion parameters, with only 52 billion parameters actively utilized during inference, a characteristic of its Mixture-of-Experts design . The model structure includes one shared expert and 16 specialized experts, with one specialized expert activated per token, in addition to the continuously active shared expert . Positional encoding is managed using Rotary Position Embedding (RoPE), and the activation function is SwiGLU . To enhance inference efficiency and mitigate the memory footprint of the KV cache, Hunyuan-Large integrates Grouped-Query Attention (GQA) and Cross-Layer Attention (CLA), leading to a substantial reduction in KV cache memory consumption . The training regimen also benefits from high-quality synthetic data, an expert-specific learning rate scaling methodology, and the integration of Flash Attention for accelerated training processes .

Hunyuan-Large supports an extensive context window of up to 256,000 tokens in its pre-trained variant, enabling the processing and comprehension of lengthy textual inputs for applications such as detailed document analysis and extensive codebases . The model has demonstrated competitive performance across various benchmarks in both English and Chinese, including MMLU, MMLU-Pro, CMMLU, GSM8K, and MATH datasets, frequently exceeding the performance of dense models and other MoE models with comparable active parameter sizes . These capabilities position Hunyuan-Large as a suitable solution for demanding tasks requiring advanced reasoning, comprehensive content generation, and sophisticated understanding of long-form text .

关于 Hunyuan

Tencent Hunyuan large language models with various capabilities.

其他 Hunyuan 模型

评估基准

排名适用于本地LLM。

没有可用的 Hunyuan Standard 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

15k

29k

所需显存:

资源

官方文档发布说明阅读论文下载权重源代码

Hunyuan Standard

技术规格

系统要求

Hunyuan Standard

关于 Hunyuan

其他 Hunyuan 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源