ApX 标志

趋近智

Hunyuan Large

活跃参数

389B

上下文长度

28K

模态

Text

架构

Mixture of Experts (MoE)

许可证

Tencent Hunyuan Community License

发布日期

5 Nov 2024

知识截止

Sep 2024

技术规格

专家参数总数

52.0B

专家数量

32

活跃专家

2

注意力结构

Multi-Head Attention

隐藏维度大小

4096

层数

60

注意力头

64

键值头

64

激活函数

GELU

归一化

Layer Normalization

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Hunyuan Large

Hunyuan-DiT is a large-scale Mixture-of-Experts (MoE) diffusion transformer designed for high-fidelity image generation. It represents Tencent's advancement in generative AI, applying a transformer architecture directly to the latent space of image generation. Its primary function is to synthesize diverse and high-quality images from textual prompts, thereby enabling content creation and visual design applications. This model is notable for its modular architecture, allowing efficient scaling and inference.

The Hunyuan-DiT model employs a diffusion transformer architecture, specifically leveraging a Mixture-of-Experts (MoE) design. This architecture partitions the model's parameters into multiple "experts," where only a subset of these experts is activated for each input token during inference. This approach allows the model to achieve a large total parameter count of approximately 389 billion while maintaining a manageable number of active parameters, approximately 52 billion, enhancing computational efficiency. The model incorporates 60 transformer layers with 64 attention heads, utilizing GeLU activation and Layer Normalization. Its design supports flexible image resolutions and uses absolute positional embeddings, integrating Rotary Positional Encoding for enhanced performance. It further utilizes a combination of bilingual CLIP and multilingual T5 encoders for robust text understanding in prompts.

Hunyuan-DiT is engineered for generating high-resolution and visually consistent images, supporting resolutions up to 4096x4096. Its MoE architecture contributes to efficient scaling, making it suitable for deployment in scenarios demanding both high quality and computational prudence. Primary use cases involve creative content generation, visual asset production, and applications requiring advanced text-to-image synthesis capabilities, such as advertising, digital art, and virtual environment design. It also supports multi-turn multimodal dialogue, enabling iterative image refinement based on user interactions.

关于 Hunyuan

Tencent Hunyuan large language models with various capabilities.


其他 Hunyuan 模型

评估基准

排名适用于本地LLM。

没有可用的 Hunyuan Large 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
14k
27k

所需显存:

推荐 GPU