趋近智
活跃参数
389B
上下文长度
28K
模态
Text
架构
Mixture of Experts (MoE)
许可证
Tencent Hunyuan Community License
发布日期
5 Nov 2024
知识截止
Sep 2024
专家参数总数
52.0B
专家数量
32
活跃专家
2
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
60
注意力头
64
键值头
64
激活函数
GELU
归一化
Layer Normalization
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Hunyuan-DiT is a large-scale Mixture-of-Experts (MoE) diffusion transformer designed for high-fidelity image generation. It represents Tencent's advancement in generative AI, applying a transformer architecture directly to the latent space of image generation. Its primary function is to synthesize diverse and high-quality images from textual prompts, thereby enabling content creation and visual design applications. This model is notable for its modular architecture, allowing efficient scaling and inference.
The Hunyuan-DiT model employs a diffusion transformer architecture, specifically leveraging a Mixture-of-Experts (MoE) design. This architecture partitions the model's parameters into multiple "experts," where only a subset of these experts is activated for each input token during inference. This approach allows the model to achieve a large total parameter count of approximately 389 billion while maintaining a manageable number of active parameters, approximately 52 billion, enhancing computational efficiency. The model incorporates 60 transformer layers with 64 attention heads, utilizing GeLU activation and Layer Normalization. Its design supports flexible image resolutions and uses absolute positional embeddings, integrating Rotary Positional Encoding for enhanced performance. It further utilizes a combination of bilingual CLIP and multilingual T5 encoders for robust text understanding in prompts.
Hunyuan-DiT is engineered for generating high-resolution and visually consistent images, supporting resolutions up to 4096x4096. Its MoE architecture contributes to efficient scaling, making it suitable for deployment in scenarios demanding both high quality and computational prudence. Primary use cases involve creative content generation, visual asset production, and applications requiring advanced text-to-image synthesis capabilities, such as advertising, digital art, and virtual environment design. It also supports multi-turn multimodal dialogue, enabling iterative image refinement based on user interactions.
Tencent Hunyuan large language models with various capabilities.
排名适用于本地LLM。
没有可用的 Hunyuan Large 评估基准。