Active Parameters
80B
Context Length
256K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
25 Jun 2025
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Head Attention
Attention Heads
32
Key-Value Heads
8
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
10,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
32
FFN Intermediate Size (Dense)
3,072
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
128,167
Mixture of Experts
Total Expert Parameters
13.0B
Number of Experts
65
Active Experts
8
Shared Experts
1
FFN Intermediate Size (per Expert)
3,072
Dense Layers Before MoE
-
Tencent's Hunyuan A13B is a large language model engineered with a Mixture-of-Experts (MoE) architecture, featuring a total of 80 billion parameters with 13 billion parameters actively engaged during inference. This design approach aims to optimize computational efficiency while maintaining strong performance capabilities. The model is presented as an open-source resource, intended for researchers and developers seeking to deploy advanced AI solutions in contexts where resource allocation requires careful consideration. Its development addresses the challenge of scaling large language models by providing a framework that allows for extensive model capacity without requiring the full activation of all parameters for every task.
The core innovation of Hunyuan A13B lies in its sparse MoE architecture, which dynamically routes input through a subset of specialized "expert" neural networks. Specifically, the architecture comprises 32 layers and incorporates SwiGLU activation functions. It utilizes Grouped Query Attention (GQA) to enhance inference efficiency and reduce memory footprint during processing. A notable feature is its hybrid reasoning mode, enabling the model to adjust its processing depth dynamically between a "fast thinking" mode for rapid responses and a "slow thinking" mode for more intricate, multi-step problem-solving, depending on the complexity of the input. The model was trained on a substantial corpus exceeding 20 trillion tokens, including a significant emphasis on data from scientific, technological, engineering, and mathematical (STEM) domains.
Hunyuan A13B supports an ultra-long context window of up to 256,000 tokens, facilitating comprehensive understanding and generation of content from extensive documents or prolonged conversational sequences. The model has been optimized for agent-based tasks, demonstrating capabilities in areas such as mathematical reasoning, logical analysis, and complex instruction following. Its design emphasizes efficient inference, supporting various quantization formats including FP8 and INT4, which allows for deployment in environments with diverse hardware specifications. This makes it suitable for applications requiring both robust language processing capabilities and optimized computational resource utilization, even potentially on single mid-range GPUs.
Tencent Hunyuan large language models with various capabilities.
No evaluation benchmarks for Hunyuan A13B available.
Overall Rank
-
Coding Rank
-
Total Score
65
/ 100
Hunyuan-A13B exhibits strong transparency in its architectural design and parameter density, providing clear technical details on its Mixture-of-Experts implementation. However, it suffers from significant opacity regarding training compute resources and employs a restrictive custom license that limits its use in several major global regions. While it provides helpful hardware guidance, the lack of granular dataset proportions remains a notable gap in its upstream transparency profile.
Architectural Provenance
The model's architecture is extensively documented in an official technical report and GitHub repository. It is a decoder-only Transformer utilizing a sparse Mixture-of-Experts (MoE) design with 64 non-shared experts and 1 shared expert. Key architectural modifications like Grouped Query Attention (GQA), SwiGLU activation, and a dual-mode 'fast/slow' reasoning framework are clearly described. The pretraining procedure, including a three-stage process (foundation, fast annealing, and long-context adaptation), is publicly detailed.
Dataset Composition
Tencent discloses that the model was trained on a 20 trillion token corpus with a specific 250 billion token STEM-focused subset. While the report mentions general categories such as math textbooks, GitHub code, and scientific texts, it lacks a precise percentage-based breakdown of the entire 20T corpus. The data cleaning and filtering methodology (e.g., 'refined knowledge labeling system') is mentioned but lacks the granular detail required for a higher score.
Tokenizer Integrity
The tokenizer is publicly available via the official Hugging Face repository and supports a vocabulary size of 128,000 tokens. It is consistent with previous Hunyuan models and is documented to support multilingual capabilities. The implementation is verifiable through the provided `tokenizer_config.json` and integration with standard libraries like `transformers` and `vLLM`.
Parameter Density
The model provides exemplary transparency regarding its MoE parameters. It explicitly states a total of 80 billion parameters with 13 billion active parameters per token (1 shared expert + 8 routed experts). The architectural breakdown (32 layers, 64 experts) is clearly defined in the technical report and configuration files, leaving no ambiguity about dense vs. sparse counts.
Training Compute
While the technical report describes the training stages and scaling laws used, it conspicuously lacks specific details on the hardware hours (GPU/TPU hours), the exact cluster specifications used for the 20T token training, and the associated carbon footprint or environmental impact data. This information is largely withheld for proprietary or competitive reasons.
Benchmark Reproducibility
Tencent provides results for numerous standard benchmarks (MMLU, MATH, GSM8K) and has released two new evaluation datasets (ArtifactsBench and C3-Bench) to the community. However, while the technical report exists, the exact evaluation scripts and full prompt templates for all reported scores are not fully centralized in a way that ensures 1:1 third-party reproduction without significant effort.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as Hunyuan-A13B and maintaining version awareness. There are no documented instances of the model claiming to be a competitor's product (e.g., GPT-4). It is transparent about its MoE nature and its specific 'thinking' modes during interaction.
License Clarity
The licensing situation is complex and potentially misleading. While marketing materials and some repository files mention 'Apache 2.0', the primary weights are governed by the 'Tencent Hunyuan Community License Agreement'. This custom license includes significant restrictions, such as territorial limitations (excluding the EU, UK, and South Korea) and prohibitions on using the model to improve other AI models, which contradicts standard open-source definitions.
Hardware Footprint
Hardware requirements are well-documented for various deployment scenarios. The repository provides VRAM estimates for FP16, FP8, and INT4 quantization. It also includes specific guidance on memory scaling for the 256K context window and suggests configurations for consumer-grade hardware (e.g., RTX 4090) versus datacenter GPUs.
Versioning Drift
The model uses basic versioning, and a changelog is present in the GitHub repository. However, the history is relatively short, and there is limited information on long-term drift or a formal deprecation policy for older weight checkpoints. Updates appear to be released as new variants rather than a continuous semantic versioning stream.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online