趋近智
注意力结构
Grouped-Query Attention
隐藏维度大小
3072
层数
40
注意力头
24
键值头
8
激活函数
-
归一化
-
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Microsoft Phi-4 is a 14 billion parameter decoder-only Transformer model, developed as the latest iteration in Microsoft's series of small language models (SLMs). The model's primary objective is to deliver advanced reasoning capabilities efficiently, enabling deployment in environments with limited compute and memory, and for latency-sensitive applications. Phi-4 is designed to handle complex logical and mathematical tasks, along with general language processing, by focusing on the quality of its training data rather than solely on model scale.
A key innovation in Phi-4's architecture and training methodology lies in its strategic use of high-quality synthetic data, which constitutes a significant portion of its training corpus. This synthetic data, generated using techniques such as multi-agent prompting, instruction reversal, and self-revision workflows, is complemented by meticulously curated organic data from web content, academic books, and code repositories. This approach enables Phi-4 to acquire strong reasoning and problem-solving abilities, often surpassing models with larger parameter counts. The model's architecture retains a similar structure to its predecessor, Phi-3, but includes enhancements such as an extended context length.
Phi-4 supports a 16,000-token context length, allowing it to process and generate extensive long-form content. Its design prioritizes efficiency and robust performance in tasks requiring logical deduction, code generation, and scientific understanding. The model is intended for research and development, serving as a foundational component for generative AI features in various applications, particularly those demanding strong reasoning in resource-constrained or low-latency scenarios.
The Microsoft Phi-4 model family comprises small language models prioritizing efficient, high-capability reasoning. Its development emphasizes robust data quality and sophisticated synthetic data integration. This approach enables enhanced performance and on-device deployment capabilities.
排名适用于本地LLM。
排名
#36
基准 | 分数 | 排名 |
---|---|---|
Professional Knowledge MMLU Pro | 0.70 | 10 |
Graduate-Level QA GPQA | 0.56 | 10 |
Reasoning LiveBench Reasoning | 0.39 | 17 |
General Knowledge MMLU | 0.56 | 18 |
Mathematics LiveBench Mathematics | 0.43 | 21 |
Coding LiveBench Coding | 0.29 | 24 |
Data Analysis LiveBench Data Analysis | 0.45 | 26 |