ApX 标志

趋近智

Hunyuan TurboS

参数

52B

上下文长度

32K

模态

Text

架构

Dense

许可证

-

发布日期

16 Jul 2025

知识截止

Dec 2024

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

-

层数

128

注意力头

-

键值头

-

激活函数

-

归一化

-

位置嵌入

Absolute Position Embedding

系统要求

不同量化方法和上下文大小的显存要求

Hunyuan TurboS

Tencent Hunyuan TurboS represents a significant advancement in large language models, engineered to deliver both rapid response times and robust reasoning capabilities. This model integrates a dual cognitive approach, analogous to human "fast thinking," to enable near-instantaneous replies for a broad spectrum of queries. Its design prioritizes efficiency and responsiveness, making it suitable for applications that demand quick, high-quality interactions. The model effectively balances speed with the capacity to address complex informational and analytical tasks, supporting a flexible approach to problem-solving.

Architecturally, Hunyuan TurboS is a novel hybrid Transformer-Mamba Mixture of Experts (MoE) model. This innovative fusion combines the strengths of Mamba2 layers, which excel at efficient processing of long sequences and reduced KV-Cache memory footprint, with the Transformer's established capacity for deep contextual understanding. The model incorporates 128 layers, comprising 57 Mamba2 layers, 7 Attention layers, and 64 Feed-Forward Network (FFN) layers. The FFN layers specifically utilize an MoE structure with 32 experts, where each token activates 1 shared and 2 specialized experts, enhancing computational efficiency. Furthermore, the model employs Grouped-Query Attention (GQA) to optimize memory usage and computational overhead during inference.

Hunyuan TurboS is designed to handle extensive information, supporting an ultra-long context length of 256,000 tokens. This capability allows the model to maintain performance across lengthy documents and extended dialogues. Its post-training strategy includes supervised fine-tuning and adaptive long-short Chain-of-Thought (CoT) fusion, enabling dynamic switching between rapid responses for simple queries and more analytical, step-by-step processing for intricate problems. The model is deployed for various applications requiring efficient, high-performance AI, such as advanced conversational agents, content generation, and sophisticated analytical systems.

关于 Hunyuan

Tencent Hunyuan large language models with various capabilities.


其他 Hunyuan 模型

评估基准

排名适用于本地LLM。

没有可用的 Hunyuan TurboS 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
16k
31k

所需显存:

推荐 GPU