Phi-3-medium: Specifications and GPU VRAM Requirements

Phi-3-medium

Open Source

Open Weights

Parameters

14B

Context Length

128K

Modality

Text

Architecture

Dense

License

MIT

Release Date

22 Apr 2024

Knowledge Cutoff

Oct 2023

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

5120

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Phi-3-medium

Phi-3-medium is a compact, high-performance large language model developed by Microsoft, belonging to the Phi-3 family of models. With 14 billion parameters, it is designed for a broad array of commercial and research applications, particularly those operating within memory or compute-constrained environments and latency-sensitive scenarios. This model aims to provide strong reasoning capabilities, notably in mathematics, logic, and code generation, positioning it as a foundational component for developing generative artificial intelligence features.

The training methodology for Phi-3-medium leverages a high-quality, reasoning-dense dataset, which is a refined and scaled version of the data utilized for its predecessor, Phi-2. This dataset incorporates both meticulously filtered publicly available web content and synthetically generated data, ensuring a robust and instruction-adherent model. The training process includes supervised fine-tuning (SFT) and direct preference optimization (DPO) to enhance its ability to follow instructions precisely and to reinforce safety measures.

The model employs a dense decoder-only Transformer architecture, a common and effective structure for autoregressive language modeling tasks. Its internal mechanisms include Grouped Query Attention (GQA) for efficient memory utilization and processing, Root Mean Square (RMS) normalization for stable training, and Rotary Positional Embeddings (RoPE) to handle positional information within sequences. A specific variant of RoPE, known as LongRope, facilitates the model's capacity to process extended context lengths up to 128,000 tokens. Phi-3-medium is optimized for deployment across diverse hardware, including graphics processing units (GPUs), central processing units (CPUs), and mobile devices, often leveraging technologies like ONNX Runtime and DirectML for cross-platform compatibility and efficient inference.

About Phi-3

Microsoft's Phi-3 models are small language models designed for efficient operation on resource-constrained devices. They utilize a transformer decoder architecture and are trained on extensively filtered, high-quality data, including synthetic compositions. This approach enables a compact yet capable model family.

Other Phi-3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

Rank

#21

Benchmark	Score	Rank
General Knowledge MMLU	0.66	14

Rankings

Overall Rank

#21

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

63k

125k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code