Hunyuan A13B: Specifications and GPU VRAM Requirements

Hunyuan A13B

Open Source

Open Weights

Active Parameters

80B

Context Length

256K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

25 Jun 2025

Knowledge Cutoff

Technical Specifications

Total Expert Parameters

13.0B

Number of Experts

Active Experts

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

System Requirements

VRAM requirements for different quantization methods and context sizes

Hunyuan A13B

Tencent's Hunyuan A13B is a large language model engineered with a Mixture-of-Experts (MoE) architecture, featuring a total of 80 billion parameters with 13 billion parameters actively engaged during inference. This design approach aims to optimize computational efficiency while maintaining strong performance capabilities. The model is presented as an open-source resource, intended for researchers and developers seeking to deploy advanced AI solutions in contexts where resource allocation requires careful consideration. Its development addresses the challenge of scaling large language models by providing a framework that allows for extensive model capacity without requiring the full activation of all parameters for every task.

The core innovation of Hunyuan A13B lies in its sparse MoE architecture, which dynamically routes input through a subset of specialized "expert" neural networks. Specifically, the architecture comprises 32 layers and incorporates SwiGLU activation functions. It utilizes Grouped Query Attention (GQA) to enhance inference efficiency and reduce memory footprint during processing. A notable feature is its hybrid reasoning mode, enabling the model to adjust its processing depth dynamically between a "fast thinking" mode for rapid responses and a "slow thinking" mode for more intricate, multi-step problem-solving, depending on the complexity of the input. The model was trained on a substantial corpus exceeding 20 trillion tokens, including a significant emphasis on data from scientific, technological, engineering, and mathematical (STEM) domains.

Hunyuan A13B supports an ultra-long context window of up to 256,000 tokens, facilitating comprehensive understanding and generation of content from extensive documents or prolonged conversational sequences. The model has been optimized for agent-based tasks, demonstrating capabilities in areas such as mathematical reasoning, logical analysis, and complex instruction following. Its design emphasizes efficient inference, supporting various quantization formats including FP8 and INT4, which allows for deployment in environments with diverse hardware specifications. This makes it suitable for applications requiring both robust language processing capabilities and optimized computational resource utilization, even potentially on single mid-range GPUs.

About Hunyuan

Tencent Hunyuan large language models with various capabilities.

Other Hunyuan Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Hunyuan A13B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

125k

250k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights Source Code