Hunyuan Large: Specifications and GPU VRAM Requirements

Hunyuan Large

Open Source

Open Weights

Active Parameters

389B

Context Length

28K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Tencent Hunyuan Community License

Release Date

5 Nov 2024

Knowledge Cutoff

Sep 2024

Technical Specifications

Total Expert Parameters

52.0B

Number of Experts

Active Experts

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

GELU

Normalization

Layer Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Hunyuan Large

Hunyuan-DiT is a large-scale Mixture-of-Experts (MoE) diffusion transformer designed for high-fidelity image generation. It represents Tencent's advancement in generative AI, applying a transformer architecture directly to the latent space of image generation. Its primary function is to synthesize diverse and high-quality images from textual prompts, thereby enabling content creation and visual design applications. This model is notable for its modular architecture, allowing efficient scaling and inference.

The Hunyuan-DiT model employs a diffusion transformer architecture, specifically leveraging a Mixture-of-Experts (MoE) design. This architecture partitions the model's parameters into multiple "experts," where only a subset of these experts is activated for each input token during inference. This approach allows the model to achieve a large total parameter count of approximately 389 billion while maintaining a manageable number of active parameters, approximately 52 billion, enhancing computational efficiency. The model incorporates 60 transformer layers with 64 attention heads, utilizing GeLU activation and Layer Normalization. Its design supports flexible image resolutions and uses absolute positional embeddings, integrating Rotary Positional Encoding for enhanced performance. It further utilizes a combination of bilingual CLIP and multilingual T5 encoders for robust text understanding in prompts.

Hunyuan-DiT is engineered for generating high-resolution and visually consistent images, supporting resolutions up to 4096x4096. Its MoE architecture contributes to efficient scaling, making it suitable for deployment in scenarios demanding both high quality and computational prudence. Primary use cases involve creative content generation, visual asset production, and applications requiring advanced text-to-image synthesis capabilities, such as advertising, digital art, and virtual environment design. It also supports multi-turn multimodal dialogue, enabling iterative image refinement based on user interactions.

About Hunyuan

Tencent Hunyuan large language models with various capabilities.

Other Hunyuan Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Hunyuan Large available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

14k

27k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights Source Code