趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
2048
层数
24
注意力头
32
键值头
32
激活函数
GELU
归一化
-
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Microsoft's Phi-1 is a compact, Transformer-based language model specifically engineered for Python code generation. Its development emphasizes the efficacy of high-quality, curated training data over sheer data volume or model scale, a principle articulated in the foundational "Textbooks Are All You Need" research. The model's training regimen involved a distinct approach, utilizing a combination of meticulously filtered code-language data from public repositories and synthetically generated Python textbooks and exercises from large language models such as GPT-3.5. This data strategy aimed to imbue the model with a "textbook-quality" understanding of programming concepts and practices, fostering robust learning despite its modest size.
The architectural design of Phi-1 is rooted in a Transformer decoder-only structure, featuring 24 layers, a hidden dimension size of 2048, and 32 attention heads. Key innovations incorporated to enhance training efficiency and performance include the adoption of Rotary Position Embedding (RoPE) for handling sequence position information and FlashAttention for accelerated attention computation. This combination of a streamlined architecture with optimized components allows Phi-1 to process input sequences efficiently while maintaining contextual coherence. The model's training focused on next-token prediction, enabling it to generate coherent and syntactically correct Python code.
Phi-1 is primarily designed for tasks involving the generation of simple Python functions from docstrings, demonstrating its utility in code generation applications. Its performance characteristics, particularly in Python coding benchmarks like HumanEval and MBPP, indicate that it can achieve results comparable to significantly larger models, underscoring the impact of its high-quality data curation. While specialized for Python, its capabilities provide a foundation for understanding the potential of small language models in targeted domains.
Phi-1 is Microsoft's foundational 1.3 billion-parameter Transformer-based small language model. Its purpose is specializing in Python code generation. A core innovation involves training on meticulously curated, "textbook-quality" data, demonstrating that high-quality data can enable capable models without extensive scale.
排名适用于本地LLM。
没有可用的 Phi-1 评估基准。