Phi-2: Specifications and GPU VRAM Requirements

Phi-2

开源

开放权重

参数

2.7B

上下文长度

2.048K

模态

Text

架构

Dense

许可证

MIT License

发布日期

12 Oct 2023

训练数据截止日期

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

2048

层数

注意力头

键值头

激活函数

归一化

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Phi-2

Microsoft Phi-2 is a small language model (SLM) with 2.7 billion parameters, representing a continuation of Microsoft Research's efforts in developing highly capable models at a compact scale. The model is designed to facilitate research into language understanding and reasoning while emphasizing efficiency and accessibility. A core objective behind its release is to provide the research community with an unconstrained, small model for investigating crucial safety challenges, including the mitigation of toxicity and the analysis of societal biases within AI systems.

The architectural foundation of Phi-2 is a Transformer-based design, employing a next-word prediction objective. Its training methodology prioritizes data quality, utilizing a substantial corpus of 1.4 trillion tokens derived from both synthetic and meticulously filtered web data. The synthetic component, generated using advanced models like GPT-3.5 and GPT-4, focuses on "textbook-quality" content to impart robust common sense reasoning, general knowledge, and specific domain understanding in areas such as science. Web data underwent stringent filtering to ensure high educational value and content integrity. The training process for Phi-2 spanned 14 days, leveraging a cluster of 96 A100 GPUs, and incorporated techniques such as Flash Attention. Notably, Phi-2 is a base model that has not undergone alignment through reinforcement learning from human feedback (RLHF) or explicit instruction fine-tuning, yet it exhibits favorable behavior regarding toxicity and bias.

Phi-2's performance characteristics position it as a proficient tool for various natural language processing applications, including question answering, conversational AI, and code generation. Its compact parameter count makes it suitable for deployment on consumer-grade GPUs, enabling efficient inference. The model demonstrates strong reasoning and language understanding capabilities, often performing comparably to or surpassing significantly larger models in specific benchmarks. Its design fosters exploration in areas such as mechanistic interpretability and fine-tuning experiments, making it a valuable resource for researchers and developers aiming to innovate with resource-efficient language models.

关于 Phi-2

Microsoft's Phi-2 is a 2.7 billion parameter Transformer-based model, developed for efficient language understanding and reasoning. Its technical innovations include training on "textbook-quality" synthetic and filtered web data, alongside scaled knowledge transfer from its predecessor, Phi-1.5, facilitating emergent capabilities within a compact architecture.

其他 Phi-2 模型

没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 Phi-2 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

所需显存:

资源

官方文档下载权重

Phi-2

技术规格

系统要求

Phi-2

关于 Phi-2

其他 Phi-2 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源