Hunyuan TurboS: Specifications and GPU VRAM Requirements

Hunyuan TurboS

闭源

封闭权重

参数

52B

上下文长度

32K

模态

Text

架构

Dense

许可证

发布日期

16 Jul 2025

训练数据截止日期

Dec 2024

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

层数

128

注意力头

键值头

激活函数

归一化

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Hunyuan TurboS

Tencent Hunyuan TurboS represents a significant advancement in large language models, engineered to deliver both rapid response times and robust reasoning capabilities. This model integrates a dual cognitive approach, analogous to human "fast thinking," to enable near-instantaneous replies for a broad spectrum of queries. Its design prioritizes efficiency and responsiveness, making it suitable for applications that demand quick, high-quality interactions. The model effectively balances speed with the capacity to address complex informational and analytical tasks, supporting a flexible approach to problem-solving.

Architecturally, Hunyuan TurboS is a novel hybrid Transformer-Mamba Mixture of Experts (MoE) model. This innovative fusion combines the strengths of Mamba2 layers, which excel at efficient processing of long sequences and reduced KV-Cache memory footprint, with the Transformer's established capacity for deep contextual understanding. The model incorporates 128 layers, comprising 57 Mamba2 layers, 7 Attention layers, and 64 Feed-Forward Network (FFN) layers. The FFN layers specifically utilize an MoE structure with 32 experts, where each token activates 1 shared and 2 specialized experts, enhancing computational efficiency. Furthermore, the model employs Grouped-Query Attention (GQA) to optimize memory usage and computational overhead during inference.

Hunyuan TurboS is designed to handle extensive information, supporting an ultra-long context length of 256,000 tokens. This capability allows the model to maintain performance across lengthy documents and extended dialogues. Its post-training strategy includes supervised fine-tuning and adaptive long-short Chain-of-Thought (CoT) fusion, enabling dynamic switching between rapid responses for simple queries and more analytical, step-by-step processing for intricate problems. The model is deployed for various applications requiring efficient, high-performance AI, such as advanced conversational agents, content generation, and sophisticated analytical systems.

关于 Hunyuan

Tencent Hunyuan large language models with various capabilities.

其他 Hunyuan 模型

评估基准

排名适用于本地LLM。

没有可用的 Hunyuan TurboS 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

31k

所需显存:

资源

官方文档阅读论文

Hunyuan TurboS

技术规格

系统要求

Hunyuan TurboS

关于 Hunyuan

其他 Hunyuan 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源