ApX logoApX logo

Falcon2-11B

Parameters

11B

Context Length

8K

Modality

Text

Architecture

Dense

License

TII Falcon License 2.0

Release Date

20 Jul 2024

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Query Attention

Attention Heads

44

Key-Value Heads

1

Attention Head Dimension

128

Position Embedding

ROPE

RoPE Theta

500,042

Sliding Window Attention

No

Sliding Window Size

-

Normalization

Layer Normalization

Activation Function

GELU

Dimensions

Hidden Dimension Size

5,632

Number of Layers

40

FFN Intermediate Size (Dense)

16,384

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

65,024

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 5.6k · Context: 8K · Vocab: 65kx 40 layersLayerNormPre-AttentionMulti-Query Attention44Q / 1KV headsHead dim: 128+LayerNormPre-FFNFeed-Forward NetworkGELUIntermediate: 16.4k+Final LayerNormOutput Logits

Falcon2-11B

Falcon 2 11B is an 11 billion parameter large language model developed by the Technology Innovation Institute (TII). This causal decoder-only model is designed to serve as a foundational component for various natural language processing applications. Its development focuses on enhancing accessibility and inference efficiency, thereby encouraging broader adoption and the creation of specialized downstream applications. The model supports multilingual understanding and generation, making it suitable for diverse linguistic contexts.

Architecturally, Falcon 2 11B is built upon the transformer framework, specifically employing a causal decoder-only configuration that operates on a next-token prediction objective. The model incorporates several key innovations adapted from the GPT-3 architecture, including the use of rotary positional embeddings for improved sequence length handling and FlashAttention-2 for optimized attention mechanisms. A notable feature is the implementation of Grouped Query Attention (GQA) with 8 key-value heads, which aims to balance efficiency and performance in attention computations. The decoder blocks utilize a parallel attention/MLP structure. The training regimen involved a four-stage process, progressively extending the effective context window to 8192 tokens. It was trained on an extensive dataset exceeding 5 trillion tokens, primarily derived from RefinedWeb, a high-quality filtered and deduplicated web corpus, augmented with curated data including code and conversational content.

Falcon 2 11B is equipped with multilingual capabilities, trained on data spanning languages such as English, German, Spanish, French, Italian, Dutch, Polish, Portuguese, Czech, Romanian, and Swedish. This broad linguistic coverage enables the model to perform effectively across multiple languages. The model serves as a base for tasks such as text generation, language translation, and summarization, emphasizing its role as a versatile foundation model for fine-tuning to specific domain requirements and applications. Its optimized design supports faster processing, contributing to more efficient deployment in various use cases.

About Falcon 2

The Falcon 2 model family by TII encompasses the 11B language model and its Vision Language Model (VLM) counterpart. These open-source models, with 11 billion parameters, are trained on over five trillion tokens, providing multilingual support. The VLM variant integrates vision-to-language capabilities, enabling the processing of visual inputs for textual outputs.


Other Falcon 2 Models
  • No related models available

Evaluation Benchmarks

No evaluation benchmarks for Falcon2-11B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B+

73 / 100

Falcon2-11B Model Integrity Report

Total Score

73

/ 100

B+

Audit Note

Falcon 2 11B demonstrates strong transparency regarding its architecture and training hardware, supported by a detailed technical report. While it provides a clear breakdown of its multi-stage training process, it maintains some opacity concerning the exact composition of its final-stage training data and lacks comprehensive evaluation code for full benchmark reproduction.

Upstream

23.0 / 30

Architectural Provenance

8.0 / 10

The model's architecture is extensively documented in the official technical report and Hugging Face model card. It is a causal decoder-only transformer with specific modifications from GPT-3, including Rotary Positional Embeddings (RoPE), FlashAttention-2, and Grouped Query Attention (GQA) with 8 KV heads. The training methodology is detailed across four distinct stages, specifying context length increases (2048 to 8192) and the transition to high-quality curated data in the final stage.

Dataset Composition

6.5 / 10

TII provides a high-level breakdown of the 5.5 trillion token dataset, primarily citing RefinedWeb (English and European variants) along with code from 'The Stack' and curated conversational data. While the multi-stage data mixture is summarized in tables within the technical report, the exact proportions of the final 'high-quality' stage are less transparent, and the full dataset is not public, though the RefinedWeb component has separate public documentation.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via the Hugging Face repository and is consistent with previous Falcon models. It has a stated vocabulary size of 65,024 tokens. Technical documentation confirms the use of a BPE-based approach, and the tokenizer's performance across the 11 supported languages is verifiable through the provided model files and evaluation results.

Model

31.0 / 40

Parameter Density

9.0 / 10

The model clearly states its 11 billion parameter count. As a dense model, all parameters are active during inference. Detailed architectural specifications are provided, including 60 transformer blocks, a hidden dimension of 4096, and 32 query heads, allowing for precise verification of the parameter density claims.

Training Compute

7.5 / 10

TII discloses that the model was trained on 1,024 NVIDIA A100 40GB GPUs using the Gigatron custom training codebase. The technical report mentions the use of 3D parallelism (TP=8, PP=1, DP=128) and ZeRO. While total GPU hours are not explicitly summed in a single figure, the hardware and parallelization strategy are detailed enough for independent estimation.

Benchmark Reproducibility

6.0 / 10

Evaluation results are provided for standard benchmarks like HellaSwag, MMLU, and ARC, with third-party verification from the Hugging Face Open LLM Leaderboard. However, the specific evaluation code and exact prompts used for internal testing are not fully public, and the technical report lacks a comprehensive reproduction guide for all claimed scores.

Identity Consistency

8.5 / 10

The model consistently identifies as a TII-developed foundation model. It does not exhibit significant identity confusion or claim to be a competitor's model in official documentation. It is transparent about being a raw pretrained model requiring further fine-tuning for specific tasks.

Downstream

18.5 / 30

License Clarity

7.0 / 10

The model is released under the TII Falcon License 2.0. While based on Apache 2.0, it includes an 'Acceptable Use Policy' and specific terms regarding the 'Object' form of the model. The license is publicly available and clearly defines commercial use permissions, though the custom modifications from standard open-source licenses add a layer of legal complexity.

Hardware Footprint

7.5 / 10

VRAM requirements are well-documented by both the provider and third-party sources (e.g., AWS documentation). It is noted that ~24GB is required for FP16 inference, and quantization impact (4-bit, 8-bit) is discussed in deployment guides. Memory scaling for the 8k context window is also addressed in technical specifications.

Versioning Drift

4.0 / 10

While the model is clearly versioned as 'Falcon 2', there is limited evidence of a formal public changelog or a structured system for tracking weight updates or performance drift over time. The transition from Falcon 1 to Falcon 2 is well-documented, but granular versioning for the 11B variant itself is minimal.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
4k
8k

VRAM Required:

Recommended GPUs