趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
-
层数
128
注意力头
-
键值头
-
激活函数
-
归一化
-
位置嵌入
Absolute Position Embedding
不同量化方法和上下文大小的显存要求
Tencent Hunyuan TurboS represents a significant advancement in large language models, engineered to deliver both rapid response times and robust reasoning capabilities. This model integrates a dual cognitive approach, analogous to human "fast thinking," to enable near-instantaneous replies for a broad spectrum of queries. Its design prioritizes efficiency and responsiveness, making it suitable for applications that demand quick, high-quality interactions. The model effectively balances speed with the capacity to address complex informational and analytical tasks, supporting a flexible approach to problem-solving.
Architecturally, Hunyuan TurboS is a novel hybrid Transformer-Mamba Mixture of Experts (MoE) model. This innovative fusion combines the strengths of Mamba2 layers, which excel at efficient processing of long sequences and reduced KV-Cache memory footprint, with the Transformer's established capacity for deep contextual understanding. The model incorporates 128 layers, comprising 57 Mamba2 layers, 7 Attention layers, and 64 Feed-Forward Network (FFN) layers. The FFN layers specifically utilize an MoE structure with 32 experts, where each token activates 1 shared and 2 specialized experts, enhancing computational efficiency. Furthermore, the model employs Grouped-Query Attention (GQA) to optimize memory usage and computational overhead during inference.
Hunyuan TurboS is designed to handle extensive information, supporting an ultra-long context length of 256,000 tokens. This capability allows the model to maintain performance across lengthy documents and extended dialogues. Its post-training strategy includes supervised fine-tuning and adaptive long-short Chain-of-Thought (CoT) fusion, enabling dynamic switching between rapid responses for simple queries and more analytical, step-by-step processing for intricate problems. The model is deployed for various applications requiring efficient, high-performance AI, such as advanced conversational agents, content generation, and sophisticated analytical systems.
Tencent Hunyuan large language models with various capabilities.
排名适用于本地LLM。
没有可用的 Hunyuan TurboS 评估基准。