Parameters
1.3B
Context Length
2.048K
Modality
Text
Architecture
Dense
License
MIT
Release Date
15 Jun 2023
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
2048
Number of Layers
24
Attention Heads
32
Key-Value Heads
32
Activation Function
GELU
Normalization
-
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Microsoft's Phi-1 is a compact, Transformer-based language model specifically engineered for Python code generation. Its development emphasizes the efficacy of high-quality, curated training data over sheer data volume or model scale, a principle articulated in the foundational "Textbooks Are All You Need" research. The model's training regimen involved a distinct approach, utilizing a combination of meticulously filtered code-language data from public repositories and synthetically generated Python textbooks and exercises from large language models such as GPT-3.5. This data strategy aimed to imbue the model with a "textbook-quality" understanding of programming concepts and practices, fostering robust learning despite its modest size.
The architectural design of Phi-1 is rooted in a Transformer decoder-only structure, featuring 24 layers, a hidden dimension size of 2048, and 32 attention heads. Key innovations incorporated to enhance training efficiency and performance include the adoption of Rotary Position Embedding (RoPE) for handling sequence position information and FlashAttention for accelerated attention computation. This combination of a streamlined architecture with optimized components allows Phi-1 to process input sequences efficiently while maintaining contextual coherence. The model's training focused on next-token prediction, enabling it to generate coherent and syntactically correct Python code.
Phi-1 is primarily designed for tasks involving the generation of simple Python functions from docstrings, demonstrating its utility in code generation applications. Its performance characteristics, particularly in Python coding benchmarks like HumanEval and MBPP, indicate that it can achieve results comparable to significantly larger models, underscoring the impact of its high-quality data curation. While specialized for Python, its capabilities provide a foundation for understanding the potential of small language models in targeted domains.
Phi-1 is Microsoft's foundational 1.3 billion-parameter Transformer-based small language model. Its purpose is specializing in Python code generation. A core innovation involves training on meticulously curated, "textbook-quality" data, demonstrating that high-quality data can enable capable models without extensive scale.
Ranking is for Local LLMs.
No evaluation benchmarks for Phi-1 available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens