趋近智
注意力结构
Multi-Head Attention
隐藏维度大小
2048
层数
32
注意力头
32
键值头
32
激活函数
-
归一化
-
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Microsoft Phi-2 is a small language model (SLM) with 2.7 billion parameters, representing a continuation of Microsoft Research's efforts in developing highly capable models at a compact scale. The model is designed to facilitate research into language understanding and reasoning while emphasizing efficiency and accessibility. A core objective behind its release is to provide the research community with an unconstrained, small model for investigating crucial safety challenges, including the mitigation of toxicity and the analysis of societal biases within AI systems.
The architectural foundation of Phi-2 is a Transformer-based design, employing a next-word prediction objective. Its training methodology prioritizes data quality, utilizing a substantial corpus of 1.4 trillion tokens derived from both synthetic and meticulously filtered web data. The synthetic component, generated using advanced models like GPT-3.5 and GPT-4, focuses on "textbook-quality" content to impart robust common sense reasoning, general knowledge, and specific domain understanding in areas such as science. Web data underwent stringent filtering to ensure high educational value and content integrity. The training process for Phi-2 spanned 14 days, leveraging a cluster of 96 A100 GPUs, and incorporated techniques such as Flash Attention. Notably, Phi-2 is a base model that has not undergone alignment through reinforcement learning from human feedback (RLHF) or explicit instruction fine-tuning, yet it exhibits favorable behavior regarding toxicity and bias.
Phi-2's performance characteristics position it as a proficient tool for various natural language processing applications, including question answering, conversational AI, and code generation. Its compact parameter count makes it suitable for deployment on consumer-grade GPUs, enabling efficient inference. The model demonstrates strong reasoning and language understanding capabilities, often performing comparably to or surpassing significantly larger models in specific benchmarks. Its design fosters exploration in areas such as mechanistic interpretability and fine-tuning experiments, making it a valuable resource for researchers and developers aiming to innovate with resource-efficient language models.
Microsoft's Phi-2 is a 2.7 billion parameter Transformer-based model, developed for efficient language understanding and reasoning. Its technical innovations include training on "textbook-quality" synthetic and filtered web data, alongside scaled knowledge transfer from its predecessor, Phi-1.5, facilitating emergent capabilities within a compact architecture.
排名适用于本地LLM。
没有可用的 Phi-2 评估基准。