Parameters
3.8B
Context Length
4.096K
Modality
Text
Architecture
Dense
License
MIT
Release Date
22 Apr 2024
Knowledge Cutoff
Oct 2023
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
3072
Number of Layers
32
Attention Heads
32
Key-Value Heads
8
Activation Function
-
Normalization
-
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Microsoft's Phi-3-mini is a lightweight, state-of-the-art small language model (SLM) designed to deliver high performance within resource-constrained environments, including mobile and edge devices. It is a foundational component of the Phi-3 model family, aiming to offer compelling capabilities at a significantly smaller scale compared to larger models. The model serves as a practical solution for scenarios where computational efficiency and reduced operational costs are paramount, thereby broadening the accessibility of advanced AI.
Architecturally, Phi-3-mini is a dense decoder-only Transformer model. Its training methodology is a key innovation, utilizing a meticulously curated dataset that is a scaled-up version of the one employed for Phi-2. This dataset comprises heavily filtered publicly available web data and synthetic "textbook-quality" data, intentionally designed to foster strong reasoning and knowledge acquisition. The model undergoes a rigorous post-training process, incorporating both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to enhance instruction adherence, robustness, and safety alignment. It features a hidden dimension size of 3072, 32 layers, 32 attention heads, and leverages grouped-query attention (GQA) with 8 key-value heads.
Phi-3-mini is primarily intended for broad commercial and research applications that require strong reasoning abilities, particularly in areas such as mathematics and logic. Its compact size facilitates deployment in latency-bound scenarios and on hardware with limited memory and compute capabilities, such as mobile phones and IoT devices. The model is available in two context length variants: a default 4K token version and a 128K token version (Phi-3-mini-128K), which utilizes LongRope for extended context handling. These characteristics make it suitable for diverse use cases ranging from general-purpose AI systems to specialized applications where efficient local inference is a requirement.
Microsoft's Phi-3 models are small language models designed for efficient operation on resource-constrained devices. They utilize a transformer decoder architecture and are trained on extensively filtered, high-quality data, including synthetic compositions. This approach enables a compact yet capable model family.
Ranking is for Local LLMs.
Rank
#28
Benchmark | Score | Rank |
---|---|---|
General Knowledge MMLU | 0.52 | 20 |
Overall Rank
#28
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens