ApX logo

Falcon-7B

Parameters

7B

Context Length

2.048K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

5 Jun 2023

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Multi-Query Attention

Hidden Dimension Size

4544

Number of Layers

32

Attention Heads

71

Key-Value Heads

1

Activation Function

-

Normalization

Layer Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Falcon-7B

Falcon-7B is a 7 billion parameter causal decoder-only language model developed by the Technology Innovation Institute (TII). Its primary purpose is to serve as a high-performance, efficient foundation for a wide array of natural language processing tasks, encompassing both language understanding and generation capabilities. The model's design emphasizes utility within research and commercial applications, providing a robust open-source option for developers and practitioners.

Architecturally, Falcon-7B builds upon the transformer framework, incorporating specific modifications to optimize performance and efficiency. A core innovation is the implementation of Multi-Query Attention (MQA), which enhances inference speed and reduces memory overhead by allowing all attention heads to share a single key and value projection. This contrasts with traditional multi-head attention that uses separate projections for each head. Furthermore, the model integrates FlashAttention, a technique that significantly accelerates both training and inference computations through memory-efficient attention mechanisms. Positional encoding is handled via Rotary Positional Embeddings (RoPE), contributing to the model's ability to process sequence information effectively. The decoder blocks feature a parallel arrangement of attention and Multi-Layer Perceptron (MLP) components, unified by a single layer normalization.

Trained on a vast dataset of 1,500 billion tokens, primarily sourced from the RefinedWeb corpus and augmented with curated datasets, Falcon-7B exhibits proficiency in generating coherent and contextually relevant text. Its architectural optimizations are specifically tailored to facilitate efficient inference, making it well-suited for deployment in scenarios where rapid response times are critical. Common use cases include text generation, chatbots, summarization, and question answering. The model is released under the Apache 2.0 license, permitting broad commercial use and fostering its integration into various AI-driven solutions and continued research endeavors.

About Falcon

The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.


Other Falcon Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Falcon-7B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
1k
2k

VRAM Required:

Recommended GPUs