Parameters
180B
Context Length
2.048K
Modality
Text
Architecture
Dense
License
Falcon-180B TII License and Acceptable Use Policy
Release Date
23 Sept 2023
Knowledge Cutoff
Dec 2022
Attention Structure
Multi-Query Attention
Hidden Dimension Size
12288
Number of Layers
60
Attention Heads
96
Key-Value Heads
1
Activation Function
GELU
Normalization
Layer Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
The Falcon-180B model, developed by the Technology Innovation Institute (TII), represents a large-scale causal decoder-only language model designed for advanced natural language processing tasks. It is an evolution of the Falcon 40B model, significantly scaled in parameter count. The model aims to serve as a foundational component for various applications requiring sophisticated language understanding and generation capabilities, including text generation, conversational AI, and summarization. This model has been specifically engineered to facilitate further fine-tuning for specialized use cases, with a separate chat-optimized variant available that has been fine-tuned on instruction datasets.
Architecturally, Falcon-180B implements an optimized transformer design, drawing inspiration from the GPT-3 framework while incorporating key innovations. A notable feature is the adoption of Multi-Query Attention (MQA), which enhances scalability and optimizes inference performance by enabling all attention heads to share a single key and value projection. The model also utilizes Rotary Position Embeddings (RoPE) for encoding positional information within sequences and incorporates FlashAttention for efficient attention computations. Its decoder blocks employ a parallel attention/MultiLayer Perceptron (MLP) structure with two layer norms, contributing to its processing efficiency. Training was conducted on a vast dataset of 3.5 trillion tokens, primarily derived from TII's RefinedWeb dataset (approximately 85%), supplemented by curated corpora including technical papers, conversations, and code. This extensive pretraining, which involved up to 4,096 A100 GPUs and accumulated around 7,000,000 GPU hours, leveraged a custom distributed training codebase named Gigatron, employing a 3D parallelism strategy combined with ZeRO optimization.
Falcon-180B is engineered for robust performance across a spectrum of language-based activities. Its design supports tasks that necessitate deep understanding and logical reasoning, such as complex research, code generation, and knowledge-based querying. The extensive training on a diverse corpus enables the model to effectively store and retrieve information, making it suitable for question answering systems and generating summaries of complex topics. The model's inherent versatility allows it to adapt to and perform effectively in a wide array of domains, supporting its utility as a powerful tool for diverse applications.
The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.
Ranking is for Local LLMs.
No evaluation benchmarks for Falcon-180B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens