Active Parameters
7B
Context Length
250K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Tencent Hunyuan Community License
Release Date
30 Oct 2024
Knowledge Cutoff
Aug 2024
Total Expert Parameters
-
Number of Experts
-
Active Experts
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
4096
Number of Layers
32
Attention Heads
32
Key-Value Heads
8
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
Absolute Position Embedding
Hunyuan Lite is a specialized, text-based language model developed by Tencent, engineered to deliver sophisticated linguistic and reasoning capabilities within a compact computational footprint. Part of the broader Hunyuan ecosystem, this model is designed for deployment on edge devices such as laptops, smartphones, and in-vehicle systems. Its primary objective is to provide a highly efficient solution for natural language understanding, code generation, and complex mathematical problem-solving without the high resource overhead typically associated with large-scale models. By optimizing the balance between performance and latency, the model enables advanced AI integration in environments where memory and power consumption are critical constraints.
The architectural framework of the 7B variant employs a dense Transformer-based structure, departing from the Mixture of Experts (MoE) design used in its larger counterparts like Hunyuan-Large or Hunyuan-A13B. A defining technical innovation of this series is its support for an ultra-long context window of 256,000 tokens, which allows for the ingestion and analysis of extensive documents, complete books, or lengthy conversation histories. The model integrates Grouped Query Attention (GQA) to accelerate inference speed and reduce the memory footprint of the KV cache. Additionally, it features a unique dual-mode reasoning capability, enabling users to switch between a "fast-thinking" mode for immediate responses and a "slow-thinking" mode that utilizes chain-of-thought processing for deeper analytical tasks.
Hunyuan Lite is optimized for versatile deployment and is compatible with mainstream inference frameworks like vLLM, SGLang, and TensorRT-LLM. The model adopts a Rotary Position Embedding (RoPE) scheme to maintain stability across its expanded context window and utilizes SwiGLU activation for enhanced expressive power in its feed-forward layers. Engineered for agentic workflows, it demonstrates high proficiency in tool-use and structured data generation. The release of open weights under a community license facilitates specialized fine-tuning and integration into private-domain knowledge engines and automated assistant platforms.
Tencent Hunyuan large language models with various capabilities.
No evaluation benchmarks for Hunyuan Lite available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens