Phi-2: Specifications and GPU VRAM Requirements

Phi-2

Open Source

Open Weights

Parameters

2.7B

Context Length

2.048K

Modality

Text

Architecture

Dense

License

MIT License

Release Date

12 Oct 2023

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

2048

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Phi-2

Microsoft Phi-2 is a small language model (SLM) with 2.7 billion parameters, representing a continuation of Microsoft Research's efforts in developing highly capable models at a compact scale. The model is designed to facilitate research into language understanding and reasoning while emphasizing efficiency and accessibility. A core objective behind its release is to provide the research community with an unconstrained, small model for investigating crucial safety challenges, including the mitigation of toxicity and the analysis of societal biases within AI systems.

The architectural foundation of Phi-2 is a Transformer-based design, employing a next-word prediction objective. Its training methodology prioritizes data quality, utilizing a substantial corpus of 1.4 trillion tokens derived from both synthetic and meticulously filtered web data. The synthetic component, generated using advanced models like GPT-3.5 and GPT-4, focuses on "textbook-quality" content to impart robust common sense reasoning, general knowledge, and specific domain understanding in areas such as science. Web data underwent stringent filtering to ensure high educational value and content integrity. The training process for Phi-2 spanned 14 days, leveraging a cluster of 96 A100 GPUs, and incorporated techniques such as Flash Attention. Notably, Phi-2 is a base model that has not undergone alignment through reinforcement learning from human feedback (RLHF) or explicit instruction fine-tuning, yet it exhibits favorable behavior regarding toxicity and bias.

Phi-2's performance characteristics position it as a proficient tool for various natural language processing applications, including question answering, conversational AI, and code generation. Its compact parameter count makes it suitable for deployment on consumer-grade GPUs, enabling efficient inference. The model demonstrates strong reasoning and language understanding capabilities, often performing comparably to or surpassing significantly larger models in specific benchmarks. Its design fosters exploration in areas such as mechanistic interpretability and fine-tuning experiments, making it a valuable resource for researchers and developers aiming to innovate with resource-efficient language models.

About Phi-2

Microsoft's Phi-2 is a 2.7 billion parameter Transformer-based model, developed for efficient language understanding and reasoning. Its technical innovations include training on "textbook-quality" synthetic and filtered web data, alongside scaled knowledge transfer from its predecessor, Phi-1.5, facilitating emergent capabilities within a compact architecture.

Other Phi-2 Models

No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Phi-2 available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Download Weights