Falcon2-11B: Specifications and GPU VRAM Requirements

Falcon2-11B

Closed Source

Open Weights

Parameters

11B

Context Length

8.192K

Modality

Text

Architecture

Dense

License

TII Falcon License 2.0

Release Date

20 Jul 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Query Attention

Hidden Dimension Size

5632

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Falcon2-11B

Falcon 2 11B is an 11 billion parameter large language model developed by the Technology Innovation Institute (TII). This causal decoder-only model is designed to serve as a foundational component for various natural language processing applications. Its development focuses on enhancing accessibility and inference efficiency, thereby encouraging broader adoption and the creation of specialized downstream applications. The model supports multilingual understanding and generation, making it suitable for diverse linguistic contexts.

Architecturally, Falcon 2 11B is built upon the transformer framework, specifically employing a causal decoder-only configuration that operates on a next-token prediction objective. The model incorporates several key innovations adapted from the GPT-3 architecture, including the use of rotary positional embeddings for improved sequence length handling and FlashAttention-2 for optimized attention mechanisms. A notable feature is the implementation of Grouped Query Attention (GQA) with 8 key-value heads, which aims to balance efficiency and performance in attention computations. The decoder blocks utilize a parallel attention/MLP structure. The training regimen involved a four-stage process, progressively extending the effective context window to 8192 tokens. It was trained on an extensive dataset exceeding 5 trillion tokens, primarily derived from RefinedWeb, a high-quality filtered and deduplicated web corpus, augmented with curated data including code and conversational content.

Falcon 2 11B is equipped with multilingual capabilities, trained on data spanning languages such as English, German, Spanish, French, Italian, Dutch, Polish, Portuguese, Czech, Romanian, and Swedish. This broad linguistic coverage enables the model to perform effectively across multiple languages. The model serves as a base for tasks such as text generation, language translation, and summarization, emphasizing its role as a versatile foundation model for fine-tuning to specific domain requirements and applications. Its optimized design supports faster processing, contributing to more efficient deployment in various use cases.

About Falcon 2

The Falcon 2 model family by TII encompasses the 11B language model and its Vision Language Model (VLM) counterpart. These open-source models, with 11 billion parameters, are trained on over five trillion tokens, providing multilingual support. The VLM variant integrates vision-to-language capabilities, enabling the processing of visual inputs for textual outputs.

Other Falcon 2 Models

No related models available

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Falcon2-11B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights