趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
4096
层数
44
注意力头
32
键值头
4
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
Absolute Position Embedding
The Yi-9B model is a sophisticated dense transformer-based large language model developed by 01.AI, designed to optimize the trade-off between parameter count and reasoning depth. It serves as a performance-oriented extension of the foundational Yi-6B model, engineered through a process of architectural expansion and multi-stage incremental training. By increasing the model's depth and continuing pre-training on an additional 0.8 trillion high-quality tokens, the developers have produced a model that excels in technical domains such as mathematics and code generation while maintaining robust bilingual fluency in English and Chinese.
Technically, Yi-9B utilizes a decoder-only architecture that mirrors the established Llama framework, enabling immediate compatibility with the broader ecosystem of LLM tools and libraries. Key architectural features include Grouped-Query Attention (GQA) to improve inference throughput and reduce memory overhead, and SwiGLU activation functions within the feed-forward layers for enhanced representational capacity. The model employs Rotary Position Embedding (RoPE) to manage sequence data and utilizes Root Mean Square Layer Normalization (RMSNorm) to stabilize training dynamics across its 44 layers.
Designed for computational efficiency, Yi-9B is particularly suited for deployment in resource-constrained environments, including consumer-grade hardware. Its extensive training on a total of 3.9 trillion tokens provides the model with a strong knowledge base for complex reasoning, reading comprehension, and common-sense logic. This makes it an effective choice for developers building AI-native applications that require a balance of high-performance technical reasoning and efficient local execution.
没有可用的 Yi-9B 评估基准。