ApX logo

Phi-1.5

Parameters

1.3B

Context Length

2.048K

Modality

Text

Architecture

Dense

License

MIT

Release Date

10 Sept 2023

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Multi-Head Attention

Hidden Dimension Size

2048

Number of Layers

24

Attention Heads

32

Key-Value Heads

32

Activation Function

GELU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Phi-1.5

Microsoft's Phi-1.5 is a Transformer-based language model containing 1.3 billion parameters. It was developed to continue the investigation into the capabilities of smaller language models, specifically focusing on common sense reasoning and general knowledge in natural language contexts. The model's design aims to provide the research community with a non-restricted, accessible model to explore challenges associated with large language models, such as reducing toxicity and enhancing controllability.

The architecture of Phi-1.5 is consistent with its predecessor, Phi-1, employing a decoder-only Transformer configuration. This architecture comprises 24 layers, with 32 attention heads, each having a dimension of 64. The model integrates Rotary Position Embeddings (RoPE) for positional encoding, utilizing a rotary dimension of 32, and leverages Flash Attention to enhance training speed and memory efficiency. A key innovation in Phi-1.5's development lies in its training methodology, which predominantly utilized a high-quality, synthetic "textbook-like" dataset. This dataset, totaling 30 billion tokens, includes 7 billion tokens from Phi-1's training data and approximately 20 billion newly generated synthetic tokens, primarily for imparting common sense reasoning and broad knowledge.

Phi-1.5 demonstrates capabilities in various natural language processing tasks, including text generation, question answering, and Python code generation. Although it is a base model not specifically fine-tuned for instruction following or through reinforcement learning from human feedback, it can produce relevant responses in formats such as QA and chat. Its compact size and specialized training regimen enable it to perform complex reasoning tasks, positioning it as a tool for research in areas like in-context learning and addressing model limitations.

About Phi-1.5

Microsoft's Phi-1.5 is a 1.3 billion parameter Transformer model, a successor to Phi-1. It was trained on a curated synthetic dataset of "textbook-quality" for common sense reasoning. The architecture comprises 24 layers, 32 attention heads, and incorporates rotary embeddings.


Other Phi-1.5 Models
  • No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Phi-1.5 available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
1k
2k

VRAM Required:

Recommended GPUs