ApX logoApX logo

Hunyuan Lite

Active Parameters

7B

Context Length

250K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Tencent Hunyuan Community License

Release Date

30 Oct 2024

Knowledge Cutoff

Aug 2024

Technical Specifications

Total Expert Parameters

-

Number of Experts

-

Active Experts

-

Attention Structure

Multi-Head Attention

Hidden Dimension Size

4096

Number of Layers

32

Attention Heads

32

Key-Value Heads

8

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

Absolute Position Embedding

Hunyuan Lite

Hunyuan Lite is a specialized, text-based language model developed by Tencent, engineered to deliver sophisticated linguistic and reasoning capabilities within a compact computational footprint. Part of the broader Hunyuan ecosystem, this model is designed for deployment on edge devices such as laptops, smartphones, and in-vehicle systems. Its primary objective is to provide a highly efficient solution for natural language understanding, code generation, and complex mathematical problem-solving without the high resource overhead typically associated with large-scale models. By optimizing the balance between performance and latency, the model enables advanced AI integration in environments where memory and power consumption are critical constraints.

The architectural framework of the 7B variant employs a dense Transformer-based structure, departing from the Mixture of Experts (MoE) design used in its larger counterparts like Hunyuan-Large or Hunyuan-A13B. A defining technical innovation of this series is its support for an ultra-long context window of 256,000 tokens, which allows for the ingestion and analysis of extensive documents, complete books, or lengthy conversation histories. The model integrates Grouped Query Attention (GQA) to accelerate inference speed and reduce the memory footprint of the KV cache. Additionally, it features a unique dual-mode reasoning capability, enabling users to switch between a "fast-thinking" mode for immediate responses and a "slow-thinking" mode that utilizes chain-of-thought processing for deeper analytical tasks.

Hunyuan Lite is optimized for versatile deployment and is compatible with mainstream inference frameworks like vLLM, SGLang, and TensorRT-LLM. The model adopts a Rotary Position Embedding (RoPE) scheme to maintain stability across its expanded context window and utilizes SwiGLU activation for enhanced expressive power in its feed-forward layers. Engineered for agentic workflows, it demonstrates high proficiency in tool-use and structured data generation. The release of open weights under a community license facilitates specialized fine-tuning and integration into private-domain knowledge engines and automated assistant platforms.

About Hunyuan

Tencent Hunyuan large language models with various capabilities.


Other Hunyuan Models

Evaluation Benchmarks

No evaluation benchmarks for Hunyuan Lite available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
122k
244k

VRAM Required:

Recommended GPUs