Phi-3-mini: Specifications and GPU VRAM Requirements

Phi-3-mini

Open Source

Open Weights

Parameters

3.8B

Context Length

4.096K

Modality

Text

Architecture

Dense

License

MIT

Release Date

22 Apr 2024

Knowledge Cutoff

Oct 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

3072

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Phi-3-mini

Microsoft's Phi-3-mini is a lightweight, state-of-the-art small language model (SLM) designed to deliver high performance within resource-constrained environments, including mobile and edge devices. It is a foundational component of the Phi-3 model family, aiming to offer compelling capabilities at a significantly smaller scale compared to larger models. The model serves as a practical solution for scenarios where computational efficiency and reduced operational costs are paramount, thereby broadening the accessibility of advanced AI.

Architecturally, Phi-3-mini is a dense decoder-only Transformer model. Its training methodology is a key innovation, utilizing a meticulously curated dataset that is a scaled-up version of the one employed for Phi-2. This dataset comprises heavily filtered publicly available web data and synthetic "textbook-quality" data, intentionally designed to foster strong reasoning and knowledge acquisition. The model undergoes a rigorous post-training process, incorporating both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to enhance instruction adherence, robustness, and safety alignment. It features a hidden dimension size of 3072, 32 layers, 32 attention heads, and leverages grouped-query attention (GQA) with 8 key-value heads.

Phi-3-mini is primarily intended for broad commercial and research applications that require strong reasoning abilities, particularly in areas such as mathematics and logic. Its compact size facilitates deployment in latency-bound scenarios and on hardware with limited memory and compute capabilities, such as mobile phones and IoT devices. The model is available in two context length variants: a default 4K token version and a 128K token version (Phi-3-mini-128K), which utilizes LongRope for extended context handling. These characteristics make it suitable for diverse use cases ranging from general-purpose AI systems to specialized applications where efficient local inference is a requirement.

About Phi-3

Microsoft's Phi-3 models are small language models designed for efficient operation on resource-constrained devices. They utilize a transformer decoder architecture and are trained on extensively filtered, high-quality data, including synthetic compositions. This approach enables a compact yet capable model family.

Other Phi-3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#31

Benchmark	Score	Rank
General Knowledge MMLU	0.52	22

Rankings

Overall Rank

#31

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code