Active Parameters
80B
Context Length
256K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
25 Jun 2025
Knowledge Cutoff
-
Total Expert Parameters
13.0B
Number of Experts
65
Active Experts
8
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
32
Attention Heads
32
Key-Value Heads
-
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
VRAM requirements for different quantization methods and context sizes
Tencent's Hunyuan A13B is a large language model engineered with a Mixture-of-Experts (MoE) architecture, featuring a total of 80 billion parameters with 13 billion parameters actively engaged during inference. This design approach aims to optimize computational efficiency while maintaining strong performance capabilities. The model is presented as an open-source resource, intended for researchers and developers seeking to deploy advanced AI solutions in contexts where resource allocation requires careful consideration. Its development addresses the challenge of scaling large language models by providing a framework that allows for extensive model capacity without requiring the full activation of all parameters for every task.
The core innovation of Hunyuan A13B lies in its sparse MoE architecture, which dynamically routes input through a subset of specialized "expert" neural networks. Specifically, the architecture comprises 32 layers and incorporates SwiGLU activation functions. It utilizes Grouped Query Attention (GQA) to enhance inference efficiency and reduce memory footprint during processing. A notable feature is its hybrid reasoning mode, enabling the model to adjust its processing depth dynamically between a "fast thinking" mode for rapid responses and a "slow thinking" mode for more intricate, multi-step problem-solving, depending on the complexity of the input. The model was trained on a substantial corpus exceeding 20 trillion tokens, including a significant emphasis on data from scientific, technological, engineering, and mathematical (STEM) domains.
Hunyuan A13B supports an ultra-long context window of up to 256,000 tokens, facilitating comprehensive understanding and generation of content from extensive documents or prolonged conversational sequences. The model has been optimized for agent-based tasks, demonstrating capabilities in areas such as mathematical reasoning, logical analysis, and complex instruction following. Its design emphasizes efficient inference, supporting various quantization formats including FP8 and INT4, which allows for deployment in environments with diverse hardware specifications. This makes it suitable for applications requiring both robust language processing capabilities and optimized computational resource utilization, even potentially on single mid-range GPUs.
Tencent Hunyuan large language models with various capabilities.
Ranking is for Local LLMs.
No evaluation benchmarks for Hunyuan A13B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens