ApX logoApX logo

Hunyuan A13B

Active Parameters

80B

Context Length

256K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

25 Jun 2025

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

32

Key-Value Heads

8

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

32

FFN Intermediate Size (Dense)

3,072

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

128,167

Mixture of Experts

Total Expert Parameters

13.0B

Number of Experts

65

Active Experts

8

Shared Experts

1

FFN Intermediate Size (per Expert)

3,072

Dense Layers Before MoE

-

Architecture Diagram

Input TokensToken EmbeddingPosition: AbsoluteHidden: 4.1k · Context: 256K · Vocab: 128.2kx 32 layersRMSNormPre-AttentionMulti-Head Attention32Q / 8KV headsHead dim: 128+RMSNormPre-FFNSparse MoE FFN (8/65 experts)SwiGLUIntermediate: 3.1k+Final RMSNormOutput Logits

Hunyuan A13B

Tencent's Hunyuan A13B is a large language model engineered with a Mixture-of-Experts (MoE) architecture, featuring a total of 80 billion parameters with 13 billion parameters actively engaged during inference. This design approach aims to optimize computational efficiency while maintaining strong performance capabilities. The model is presented as an open-source resource, intended for researchers and developers seeking to deploy advanced AI solutions in contexts where resource allocation requires careful consideration. Its development addresses the challenge of scaling large language models by providing a framework that allows for extensive model capacity without requiring the full activation of all parameters for every task.

The core innovation of Hunyuan A13B lies in its sparse MoE architecture, which dynamically routes input through a subset of specialized "expert" neural networks. Specifically, the architecture comprises 32 layers and incorporates SwiGLU activation functions. It utilizes Grouped Query Attention (GQA) to enhance inference efficiency and reduce memory footprint during processing. A notable feature is its hybrid reasoning mode, enabling the model to adjust its processing depth dynamically between a "fast thinking" mode for rapid responses and a "slow thinking" mode for more intricate, multi-step problem-solving, depending on the complexity of the input. The model was trained on a substantial corpus exceeding 20 trillion tokens, including a significant emphasis on data from scientific, technological, engineering, and mathematical (STEM) domains.

Hunyuan A13B supports an ultra-long context window of up to 256,000 tokens, facilitating comprehensive understanding and generation of content from extensive documents or prolonged conversational sequences. The model has been optimized for agent-based tasks, demonstrating capabilities in areas such as mathematical reasoning, logical analysis, and complex instruction following. Its design emphasizes efficient inference, supporting various quantization formats including FP8 and INT4, which allows for deployment in environments with diverse hardware specifications. This makes it suitable for applications requiring both robust language processing capabilities and optimized computational resource utilization, even potentially on single mid-range GPUs.

About Hunyuan

Tencent Hunyuan large language models with various capabilities.


Other Hunyuan Models

Evaluation Benchmarks

No evaluation benchmarks for Hunyuan A13B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

65 / 100

Hunyuan A13B Model Integrity Report

Total Score

65

/ 100

B

Audit Note

Hunyuan-A13B exhibits strong transparency in its architectural design and parameter density, providing clear technical details on its Mixture-of-Experts implementation. However, it suffers from significant opacity regarding training compute resources and employs a restrictive custom license that limits its use in several major global regions. While it provides helpful hardware guidance, the lack of granular dataset proportions remains a notable gap in its upstream transparency profile.

Upstream

22.0 / 30

Architectural Provenance

8.0 / 10

The model's architecture is extensively documented in an official technical report and GitHub repository. It is a decoder-only Transformer utilizing a sparse Mixture-of-Experts (MoE) design with 64 non-shared experts and 1 shared expert. Key architectural modifications like Grouped Query Attention (GQA), SwiGLU activation, and a dual-mode 'fast/slow' reasoning framework are clearly described. The pretraining procedure, including a three-stage process (foundation, fast annealing, and long-context adaptation), is publicly detailed.

Dataset Composition

5.5 / 10

Tencent discloses that the model was trained on a 20 trillion token corpus with a specific 250 billion token STEM-focused subset. While the report mentions general categories such as math textbooks, GitHub code, and scientific texts, it lacks a precise percentage-based breakdown of the entire 20T corpus. The data cleaning and filtering methodology (e.g., 'refined knowledge labeling system') is mentioned but lacks the granular detail required for a higher score.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly available via the official Hugging Face repository and supports a vocabulary size of 128,000 tokens. It is consistent with previous Hunyuan models and is documented to support multilingual capabilities. The implementation is verifiable through the provided `tokenizer_config.json` and integration with standard libraries like `transformers` and `vLLM`.

Model

26.0 / 40

Parameter Density

9.0 / 10

The model provides exemplary transparency regarding its MoE parameters. It explicitly states a total of 80 billion parameters with 13 billion active parameters per token (1 shared expert + 8 routed experts). The architectural breakdown (32 layers, 64 experts) is clearly defined in the technical report and configuration files, leaving no ambiguity about dense vs. sparse counts.

Training Compute

2.0 / 10

While the technical report describes the training stages and scaling laws used, it conspicuously lacks specific details on the hardware hours (GPU/TPU hours), the exact cluster specifications used for the 20T token training, and the associated carbon footprint or environmental impact data. This information is largely withheld for proprietary or competitive reasons.

Benchmark Reproducibility

6.0 / 10

Tencent provides results for numerous standard benchmarks (MMLU, MATH, GSM8K) and has released two new evaluation datasets (ArtifactsBench and C3-Bench) to the community. However, while the technical report exists, the exact evaluation scripts and full prompt templates for all reported scores are not fully centralized in a way that ensures 1:1 third-party reproduction without significant effort.

Identity Consistency

9.0 / 10

The model demonstrates high identity consistency, correctly identifying itself as Hunyuan-A13B and maintaining version awareness. There are no documented instances of the model claiming to be a competitor's product (e.g., GPT-4). It is transparent about its MoE nature and its specific 'thinking' modes during interaction.

Downstream

16.5 / 30

License Clarity

4.0 / 10

The licensing situation is complex and potentially misleading. While marketing materials and some repository files mention 'Apache 2.0', the primary weights are governed by the 'Tencent Hunyuan Community License Agreement'. This custom license includes significant restrictions, such as territorial limitations (excluding the EU, UK, and South Korea) and prohibitions on using the model to improve other AI models, which contradicts standard open-source definitions.

Hardware Footprint

7.5 / 10

Hardware requirements are well-documented for various deployment scenarios. The repository provides VRAM estimates for FP16, FP8, and INT4 quantization. It also includes specific guidance on memory scaling for the 256K context window and suggests configurations for consumer-grade hardware (e.g., RTX 4090) versus datacenter GPUs.

Versioning Drift

5.0 / 10

The model uses basic versioning, and a changelog is present in the GitHub repository. However, the history is relatively short, and there is limited information on long-term drift or a formal deprecation policy for older weight checkpoints. Updates appear to be released as new variants rather than a continuous semantic versioning stream.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
125k
250k

VRAM Required:

Recommended GPUs

Hunyuan A13B: Specifications and GPU VRAM Requirements