ApX 标志

趋近智

Phi-1.5

参数

1.3B

上下文长度

2.048K

模态

Text

架构

Dense

许可证

MIT

发布日期

10 Sept 2023

知识截止

-

技术规格

注意力结构

Multi-Head Attention

隐藏维度大小

2048

层数

24

注意力头

32

键值头

32

激活函数

GELU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Phi-1.5

Microsoft's Phi-1.5 is a Transformer-based language model containing 1.3 billion parameters. It was developed to continue the investigation into the capabilities of smaller language models, specifically focusing on common sense reasoning and general knowledge in natural language contexts. The model's design aims to provide the research community with a non-restricted, accessible model to explore challenges associated with large language models, such as reducing toxicity and enhancing controllability.

The architecture of Phi-1.5 is consistent with its predecessor, Phi-1, employing a decoder-only Transformer configuration. This architecture comprises 24 layers, with 32 attention heads, each having a dimension of 64. The model integrates Rotary Position Embeddings (RoPE) for positional encoding, utilizing a rotary dimension of 32, and leverages Flash Attention to enhance training speed and memory efficiency. A key innovation in Phi-1.5's development lies in its training methodology, which predominantly utilized a high-quality, synthetic "textbook-like" dataset. This dataset, totaling 30 billion tokens, includes 7 billion tokens from Phi-1's training data and approximately 20 billion newly generated synthetic tokens, primarily for imparting common sense reasoning and broad knowledge.

Phi-1.5 demonstrates capabilities in various natural language processing tasks, including text generation, question answering, and Python code generation. Although it is a base model not specifically fine-tuned for instruction following or through reinforcement learning from human feedback, it can produce relevant responses in formats such as QA and chat. Its compact size and specialized training regimen enable it to perform complex reasoning tasks, positioning it as a tool for research in areas like in-context learning and addressing model limitations.

关于 Phi-1.5

Microsoft's Phi-1.5 is a 1.3 billion parameter Transformer model, a successor to Phi-1. It was trained on a curated synthetic dataset of "textbook-quality" for common sense reasoning. The architecture comprises 24 layers, 32 attention heads, and incorporates rotary embeddings.


其他 Phi-1.5 模型
  • 没有相关模型

评估基准

排名适用于本地LLM。

没有可用的 Phi-1.5 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
1k
2k

所需显存:

推荐 GPU