Parameters
1.3B
Context Length
2.048K
Modality
Text
Architecture
Dense
License
MIT
Release Date
10 Sept 2023
Knowledge Cutoff
-
Attention Structure
Multi-Head Attention
Hidden Dimension Size
2048
Number of Layers
24
Attention Heads
32
Key-Value Heads
32
Activation Function
GELU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Microsoft's Phi-1.5 is a Transformer-based language model containing 1.3 billion parameters. It was developed to continue the investigation into the capabilities of smaller language models, specifically focusing on common sense reasoning and general knowledge in natural language contexts. The model's design aims to provide the research community with a non-restricted, accessible model to explore challenges associated with large language models, such as reducing toxicity and enhancing controllability.
The architecture of Phi-1.5 is consistent with its predecessor, Phi-1, employing a decoder-only Transformer configuration. This architecture comprises 24 layers, with 32 attention heads, each having a dimension of 64. The model integrates Rotary Position Embeddings (RoPE) for positional encoding, utilizing a rotary dimension of 32, and leverages Flash Attention to enhance training speed and memory efficiency. A key innovation in Phi-1.5's development lies in its training methodology, which predominantly utilized a high-quality, synthetic "textbook-like" dataset. This dataset, totaling 30 billion tokens, includes 7 billion tokens from Phi-1's training data and approximately 20 billion newly generated synthetic tokens, primarily for imparting common sense reasoning and broad knowledge.
Phi-1.5 demonstrates capabilities in various natural language processing tasks, including text generation, question answering, and Python code generation. Although it is a base model not specifically fine-tuned for instruction following or through reinforcement learning from human feedback, it can produce relevant responses in formats such as QA and chat. Its compact size and specialized training regimen enable it to perform complex reasoning tasks, positioning it as a tool for research in areas like in-context learning and addressing model limitations.
Microsoft's Phi-1.5 is a 1.3 billion parameter Transformer model, a successor to Phi-1. It was trained on a curated synthetic dataset of "textbook-quality" for common sense reasoning. The architecture comprises 24 layers, 32 attention heads, and incorporates rotary embeddings.
Ranking is for Local LLMs.
No evaluation benchmarks for Phi-1.5 available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens