趋近智
注意力结构
Grouped-Query Attention
隐藏维度大小
3072
层数
32
注意力头
32
键值头
8
激活函数
-
归一化
-
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Microsoft's Phi-3-mini is a lightweight, state-of-the-art small language model (SLM) designed to deliver high performance within resource-constrained environments, including mobile and edge devices. It is a foundational component of the Phi-3 model family, aiming to offer compelling capabilities at a significantly smaller scale compared to larger models. The model serves as a practical solution for scenarios where computational efficiency and reduced operational costs are paramount, thereby broadening the accessibility of advanced AI.
Architecturally, Phi-3-mini is a dense decoder-only Transformer model. Its training methodology is a key innovation, utilizing a meticulously curated dataset that is a scaled-up version of the one employed for Phi-2. This dataset comprises heavily filtered publicly available web data and synthetic "textbook-quality" data, intentionally designed to foster strong reasoning and knowledge acquisition. The model undergoes a rigorous post-training process, incorporating both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to enhance instruction adherence, robustness, and safety alignment. It features a hidden dimension size of 3072, 32 layers, 32 attention heads, and leverages grouped-query attention (GQA) with 8 key-value heads.
Phi-3-mini is primarily intended for broad commercial and research applications that require strong reasoning abilities, particularly in areas such as mathematics and logic. Its compact size facilitates deployment in latency-bound scenarios and on hardware with limited memory and compute capabilities, such as mobile phones and IoT devices. The model is available in two context length variants: a default 4K token version and a 128K token version (Phi-3-mini-128K), which utilizes LongRope for extended context handling. These characteristics make it suitable for diverse use cases ranging from general-purpose AI systems to specialized applications where efficient local inference is a requirement.
Microsoft's Phi-3 models are small language models designed for efficient operation on resource-constrained devices. They utilize a transformer decoder architecture and are trained on extensively filtered, high-quality data, including synthetic compositions. This approach enables a compact yet capable model family.
排名适用于本地LLM。
排名
#28
基准 | 分数 | 排名 |
---|---|---|
General Knowledge MMLU | 0.52 | 20 |