Parameters
14B
Context Length
16K
Modality
Text
Architecture
Dense
License
MIT License
Release Date
13 Dec 2024
Knowledge Cutoff
Nov 2024
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
3072
Number of Layers
40
Attention Heads
24
Key-Value Heads
8
Activation Function
-
Normalization
-
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Microsoft Phi-4 is a 14 billion parameter decoder-only Transformer model, developed as the latest iteration in Microsoft's series of small language models (SLMs). The model's primary objective is to deliver advanced reasoning capabilities efficiently, enabling deployment in environments with limited compute and memory, and for latency-sensitive applications. Phi-4 is designed to handle complex logical and mathematical tasks, along with general language processing, by focusing on the quality of its training data rather than solely on model scale.
A key innovation in Phi-4's architecture and training methodology lies in its strategic use of high-quality synthetic data, which constitutes a significant portion of its training corpus. This synthetic data, generated using techniques such as multi-agent prompting, instruction reversal, and self-revision workflows, is complemented by meticulously curated organic data from web content, academic books, and code repositories. This approach enables Phi-4 to acquire strong reasoning and problem-solving abilities, often surpassing models with larger parameter counts. The model's architecture retains a similar structure to its predecessor, Phi-3, but includes enhancements such as an extended context length.
Phi-4 supports a 16,000-token context length, allowing it to process and generate extensive long-form content. Its design prioritizes efficiency and robust performance in tasks requiring logical deduction, code generation, and scientific understanding. The model is intended for research and development, serving as a foundational component for generative AI features in various applications, particularly those demanding strong reasoning in resource-constrained or low-latency scenarios.
The Microsoft Phi-4 model family comprises small language models prioritizing efficient, high-capability reasoning. Its development emphasizes robust data quality and sophisticated synthetic data integration. This approach enables enhanced performance and on-device deployment capabilities.
Ranking is for Local LLMs.
Rank
#36
Benchmark | Score | Rank |
---|---|---|
Professional Knowledge MMLU Pro | 0.70 | 10 |
Graduate-Level QA GPQA | 0.56 | 10 |
Reasoning LiveBench Reasoning | 0.39 | 17 |
General Knowledge MMLU | 0.56 | 18 |
Mathematics LiveBench Mathematics | 0.43 | 21 |
Coding LiveBench Coding | 0.29 | 24 |
Data Analysis LiveBench Data Analysis | 0.45 | 26 |
Overall Rank
#36
Coding Rank
#33
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens