Active Parameters
389B
Context Length
28K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Tencent Hunyuan Community License
Release Date
5 Nov 2024
Knowledge Cutoff
Sep 2024
Total Expert Parameters
52.0B
Number of Experts
32
Active Experts
2
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
60
Attention Heads
64
Key-Value Heads
64
Activation Function
GELU
Normalization
Layer Normalization
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Hunyuan-DiT is a large-scale Mixture-of-Experts (MoE) diffusion transformer designed for high-fidelity image generation. It represents Tencent's advancement in generative AI, applying a transformer architecture directly to the latent space of image generation. Its primary function is to synthesize diverse and high-quality images from textual prompts, thereby enabling content creation and visual design applications. This model is notable for its modular architecture, allowing efficient scaling and inference.
The Hunyuan-DiT model employs a diffusion transformer architecture, specifically leveraging a Mixture-of-Experts (MoE) design. This architecture partitions the model's parameters into multiple "experts," where only a subset of these experts is activated for each input token during inference. This approach allows the model to achieve a large total parameter count of approximately 389 billion while maintaining a manageable number of active parameters, approximately 52 billion, enhancing computational efficiency. The model incorporates 60 transformer layers with 64 attention heads, utilizing GeLU activation and Layer Normalization. Its design supports flexible image resolutions and uses absolute positional embeddings, integrating Rotary Positional Encoding for enhanced performance. It further utilizes a combination of bilingual CLIP and multilingual T5 encoders for robust text understanding in prompts.
Hunyuan-DiT is engineered for generating high-resolution and visually consistent images, supporting resolutions up to 4096x4096. Its MoE architecture contributes to efficient scaling, making it suitable for deployment in scenarios demanding both high quality and computational prudence. Primary use cases involve creative content generation, visual asset production, and applications requiring advanced text-to-image synthesis capabilities, such as advertising, digital art, and virtual environment design. It also supports multi-turn multimodal dialogue, enabling iterative image refinement based on user interactions.
Tencent Hunyuan large language models with various capabilities.
Ranking is for Local LLMs.
No evaluation benchmarks for Hunyuan Large available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens