Hunyuan Turbo: Specifications and GPU VRAM Requirements

Hunyuan Turbo

Closed Source

Closed Weights

Parameters

52B

Context Length

32K

Modality

Text

Architecture

Dense

License

Release Date

15 May 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Hunyuan Turbo

Tencent Hunyuan Turbo S is a high-performance language model engineered for rapid text generation and efficient analytical reasoning. This model aims to deliver near-instantaneous responses, significantly reducing initial word latency and enhancing overall output speed. It is designed to function as a foundational element for the development of advanced applications requiring sophisticated reasoning, extensive text processing capabilities, and robust code generation.

The architectural design of Hunyuan Turbo S incorporates a hybrid Mamba-Transformer fusion within a Mixture of Experts (MoE) framework. This represents an integration of the Mamba state-space model into a super-large MoE, balancing Mamba's efficiency in processing long sequences with the Transformer's proficiency in complex contextual understanding. A key innovation in its architecture is the implementation of a "fast thinking" and "slow thinking" paradigm. "Fast thinking" enables quick, intuitive responses for routine queries through optimized word speed and reduced latency. "Slow thinking," which draws knowledge from the Hunyuan T1 model, facilitates deliberate analytical processing essential for intricate problem-solving in domains such as mathematics, logical deduction, and scientific inquiry. The model further optimizes computational efficiency and reduces KV-Cache usage by employing Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies.

Hunyuan Turbo S is optimized for strong performance across a range of linguistic and analytical tasks, including knowledge acquisition, mathematical computations, and creative content generation. Its design emphasizes efficiency, leading to reduced computational complexity and lower inference costs. These performance characteristics render the model suitable for deployment in applications demanding swift and accurate text outputs, such as intelligent customer support systems, interactive chatbot interfaces, and various enterprise AI solutions where both response time and economic efficiency are critical operational considerations. The model is also capable of handling extended context lengths, which supports maintaining coherence and relevance in prolonged conversational or document-based interactions.

About Hunyuan

Tencent Hunyuan large language models with various capabilities.

Other Hunyuan Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Hunyuan Turbo available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

16k

31k

VRAM Required:

Recommended GPUs

Resources

Official Documentation