Parameters
11B
Context Length
8K
Modality
Text
Architecture
Dense
License
TII Falcon License 2.0
Release Date
20 Jul 2024
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Query Attention
Attention Heads
44
Key-Value Heads
1
Attention Head Dimension
128
Position Embedding
ROPE
RoPE Theta
500,042
Sliding Window Attention
No
Sliding Window Size
-
Normalization
Layer Normalization
Activation Function
GELU
Dimensions
Hidden Dimension Size
5,632
Number of Layers
40
FFN Intermediate Size (Dense)
16,384
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
65,024
Falcon 2 11B is an 11 billion parameter large language model developed by the Technology Innovation Institute (TII). This causal decoder-only model is designed to serve as a foundational component for various natural language processing applications. Its development focuses on enhancing accessibility and inference efficiency, thereby encouraging broader adoption and the creation of specialized downstream applications. The model supports multilingual understanding and generation, making it suitable for diverse linguistic contexts.
Architecturally, Falcon 2 11B is built upon the transformer framework, specifically employing a causal decoder-only configuration that operates on a next-token prediction objective. The model incorporates several key innovations adapted from the GPT-3 architecture, including the use of rotary positional embeddings for improved sequence length handling and FlashAttention-2 for optimized attention mechanisms. A notable feature is the implementation of Grouped Query Attention (GQA) with 8 key-value heads, which aims to balance efficiency and performance in attention computations. The decoder blocks utilize a parallel attention/MLP structure. The training regimen involved a four-stage process, progressively extending the effective context window to 8192 tokens. It was trained on an extensive dataset exceeding 5 trillion tokens, primarily derived from RefinedWeb, a high-quality filtered and deduplicated web corpus, augmented with curated data including code and conversational content.
Falcon 2 11B is equipped with multilingual capabilities, trained on data spanning languages such as English, German, Spanish, French, Italian, Dutch, Polish, Portuguese, Czech, Romanian, and Swedish. This broad linguistic coverage enables the model to perform effectively across multiple languages. The model serves as a base for tasks such as text generation, language translation, and summarization, emphasizing its role as a versatile foundation model for fine-tuning to specific domain requirements and applications. Its optimized design supports faster processing, contributing to more efficient deployment in various use cases.
The Falcon 2 model family by TII encompasses the 11B language model and its Vision Language Model (VLM) counterpart. These open-source models, with 11 billion parameters, are trained on over five trillion tokens, providing multilingual support. The VLM variant integrates vision-to-language capabilities, enabling the processing of visual inputs for textual outputs.
No evaluation benchmarks for Falcon2-11B available.
Overall Rank
-
Coding Rank
-
Total Score
73
/ 100
Falcon 2 11B demonstrates strong transparency regarding its architecture and training hardware, supported by a detailed technical report. While it provides a clear breakdown of its multi-stage training process, it maintains some opacity concerning the exact composition of its final-stage training data and lacks comprehensive evaluation code for full benchmark reproduction.
Architectural Provenance
The model's architecture is extensively documented in the official technical report and Hugging Face model card. It is a causal decoder-only transformer with specific modifications from GPT-3, including Rotary Positional Embeddings (RoPE), FlashAttention-2, and Grouped Query Attention (GQA) with 8 KV heads. The training methodology is detailed across four distinct stages, specifying context length increases (2048 to 8192) and the transition to high-quality curated data in the final stage.
Dataset Composition
TII provides a high-level breakdown of the 5.5 trillion token dataset, primarily citing RefinedWeb (English and European variants) along with code from 'The Stack' and curated conversational data. While the multi-stage data mixture is summarized in tables within the technical report, the exact proportions of the final 'high-quality' stage are less transparent, and the full dataset is not public, though the RefinedWeb component has separate public documentation.
Tokenizer Integrity
The tokenizer is publicly accessible via the Hugging Face repository and is consistent with previous Falcon models. It has a stated vocabulary size of 65,024 tokens. Technical documentation confirms the use of a BPE-based approach, and the tokenizer's performance across the 11 supported languages is verifiable through the provided model files and evaluation results.
Parameter Density
The model clearly states its 11 billion parameter count. As a dense model, all parameters are active during inference. Detailed architectural specifications are provided, including 60 transformer blocks, a hidden dimension of 4096, and 32 query heads, allowing for precise verification of the parameter density claims.
Training Compute
TII discloses that the model was trained on 1,024 NVIDIA A100 40GB GPUs using the Gigatron custom training codebase. The technical report mentions the use of 3D parallelism (TP=8, PP=1, DP=128) and ZeRO. While total GPU hours are not explicitly summed in a single figure, the hardware and parallelization strategy are detailed enough for independent estimation.
Benchmark Reproducibility
Evaluation results are provided for standard benchmarks like HellaSwag, MMLU, and ARC, with third-party verification from the Hugging Face Open LLM Leaderboard. However, the specific evaluation code and exact prompts used for internal testing are not fully public, and the technical report lacks a comprehensive reproduction guide for all claimed scores.
Identity Consistency
The model consistently identifies as a TII-developed foundation model. It does not exhibit significant identity confusion or claim to be a competitor's model in official documentation. It is transparent about being a raw pretrained model requiring further fine-tuning for specific tasks.
License Clarity
The model is released under the TII Falcon License 2.0. While based on Apache 2.0, it includes an 'Acceptable Use Policy' and specific terms regarding the 'Object' form of the model. The license is publicly available and clearly defines commercial use permissions, though the custom modifications from standard open-source licenses add a layer of legal complexity.
Hardware Footprint
VRAM requirements are well-documented by both the provider and third-party sources (e.g., AWS documentation). It is noted that ~24GB is required for FP16 inference, and quantization impact (4-bit, 8-bit) is discussed in deployment guides. Memory scaling for the 8k context window is also addressed in technical specifications.
Versioning Drift
While the model is clearly versioned as 'Falcon 2', there is limited evidence of a formal public changelog or a structured system for tracking weight updates or performance drift over time. The transition from Falcon 1 to Falcon 2 is well-documented, but granular versioning for the 11B variant itself is minimal.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online