ApX 标志

趋近智

Phi-1

参数

1.3B

上下文长度

2.048K

模态

Text

架构

Dense

许可证

MIT

发布日期

15 Jun 2023

知识截止

-

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

2048

层数

24

注意力头

32

键值头

32

激活函数

GELU

归一化

-

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Phi-1

Microsoft's Phi-1 is a compact, Transformer-based language model specifically engineered for Python code generation. Its development emphasizes the efficacy of high-quality, curated training data over sheer data volume or model scale, a principle articulated in the foundational "Textbooks Are All You Need" research. The model's training regimen involved a distinct approach, utilizing a combination of meticulously filtered code-language data from public repositories and synthetically generated Python textbooks and exercises from large language models such as GPT-3.5. This data strategy aimed to imbue the model with a "textbook-quality" understanding of programming concepts and practices, fostering robust learning despite its modest size.

The architectural design of Phi-1 is rooted in a Transformer decoder-only structure, featuring 24 layers, a hidden dimension size of 2048, and 32 attention heads. Key innovations incorporated to enhance training efficiency and performance include the adoption of Rotary Position Embedding (RoPE) for handling sequence position information and FlashAttention for accelerated attention computation. This combination of a streamlined architecture with optimized components allows Phi-1 to process input sequences efficiently while maintaining contextual coherence. The model's training focused on next-token prediction, enabling it to generate coherent and syntactically correct Python code.

Phi-1 is primarily designed for tasks involving the generation of simple Python functions from docstrings, demonstrating its utility in code generation applications. Its performance characteristics, particularly in Python coding benchmarks like HumanEval and MBPP, indicate that it can achieve results comparable to significantly larger models, underscoring the impact of its high-quality data curation. While specialized for Python, its capabilities provide a foundation for understanding the potential of small language models in targeted domains.

关于 Phi-1

Phi-1 is Microsoft's foundational 1.3 billion-parameter Transformer-based small language model. Its purpose is specializing in Python code generation. A core innovation involves training on meticulously curated, "textbook-quality" data, demonstrating that high-quality data can enable capable models without extensive scale.


其他 Phi-1 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 Phi-1 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
1k
2k

所需显存:

推荐 GPU

Phi-1: Specifications and GPU VRAM Requirements