Falcon3-7B

Open Source

Open Weights

Parameters

Context Length

32K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

System Requirements

VRAM requirements for different quantization methods and context sizes

1,024 tokens

16.52 GB VRAM

Consumer

1x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

32,000 tokens

26.11 GB VRAM

Consumer

2x RTX 4090

24GB VRAM

Datacenter

1x NVIDIA A100

80GB VRAM

Apple Silicon

1x Apple M3 Max

128GB VRAM

Architecture Diagram

Evaluation Benchmarks

No evaluation benchmarks for Falcon3-7B available.

Rankings

Overall Rank

Coding Rank

About Falcon3-7B

Falcon 3-7B is a state-of-the-art instruction-tuned language model developed by the Technology Innovation Institute (TII). This model variant is a component of the Falcon 3 family, which focuses on enhancing capabilities in scientific domains, mathematics, and code generation. It is engineered for efficiency and scalability, enabling deployment on a range of infrastructures, including those with limited computational resources. The model supports multilingual applications, with training encompassing English, French, Spanish, and Portuguese, and is designed to handle long-context tasks.

The architectural foundation of Falcon 3-7B is a transformer-based causal decoder-only design, incorporating 28 decoder blocks. It utilizes Grouped Query Attention (GQA) to optimize inference speed and memory efficiency, configured with 12 query heads and 4 key-value heads, and a head dimension of 256. The model integrates Rotary Positional Embedding (RoPE) with a high value of 1000042 to facilitate effective understanding and processing of extended contexts up to 32,000 tokens. Activation functions are implemented using SwiGLU, complemented by RMSNorm for normalization, contributing to training stability and efficiency. It is also optimized to utilize FlashAttention-3.

Falcon 3-7B was pretrained on a dataset comprising 14 teratokens of diverse web, code, scientific, and high-quality multilingual data. Following pretraining, it underwent further fine-tuning on 1.2 million samples, specifically tailored for STEM content, conversational interactions, code, and safety compliance. This comprehensive training regimen positions the model for robust performance across various applications, including scientific and mathematical problem-solving, multilingual content generation, and processing long-form textual information. Its design supports instruction-following tasks, making it suitable for educational tools, research assistance, and the generation of technical documentation.

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

1,000,042

Sliding Window Attention

Sliding Window Size

Sliding Window Ratio

Linear Attention

Linear Attention Ratio

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size (Dense)

23,040

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

131,072

Model Integrity

Total Score

B+

73 / 100

Upstream

22.0 / 30

Model

30.0 / 40

Downstream

20.5 / 30

Falcon3-7B Model Integrity Report

Total Score

/ 100

B+

Audit Note

Falcon 3-7B exhibits strong transparency regarding its technical architecture and hardware requirements, providing developers with the necessary specifications for efficient deployment. While it offers clear identity and a public license, it lacks granular detail in its dataset composition and formal versioning history. The model represents a significant step forward in open-weight documentation for the Falcon family, though a full technical report is still required to reach exemplary transparency levels.

Upstream

22.0 / 30

Architectural Provenance

8.0 / 10

Falcon 3-7B is explicitly documented as a transformer-based causal decoder-only model with 28 decoder blocks. TII provides specific architectural details including the use of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, a head dimension of 256, and SwiGLU activation with RMSNorm. The use of Rotary Positional Embedding (RoPE) with a specific base (1000042) to support its 32K context window is well-documented in official model cards and technical blogs. While a full peer-reviewed paper for the Falcon 3 iteration is pending (announced for early 2025), the technical specifications are highly detailed and publicly accessible via Hugging Face and TII's official portal.

Dataset Composition

5.0 / 10

TII discloses that the model was pretrained on 14 trillion tokens and post-trained on 1.2 million samples. General categories are provided: web data, code, STEM, and multilingual content (English, French, Spanish, Portuguese). However, specific percentage breakdowns of these sources are not publicly available. While TII has a history of releasing data extracts (e.g., RefinedWeb for Falcon 1), the specific composition and filtering methodology for the 14T token Falcon 3 dataset remain described in general terms rather than granular detail.

Tokenizer Integrity

9.0 / 10

The tokenizer is fully accessible via the Hugging Face repository and integrated into the `transformers` library. It features a clearly stated vocabulary size of 131,072 tokens, which is a significant increase from previous versions to improve compression across its supported languages. The tokenization approach is verifiable through the public `tokenizer_config.json` and `tokenizer.json` files, which confirm the claimed multilingual support and special token handling (e.g., chat templates).

Model

30.0 / 40

Parameter Density

8.5 / 10

The model is a dense architecture with approximately 7.46 billion total parameters. Unlike MoE models, all parameters are active during inference, which is clearly reflected in the documentation and model files. The architectural breakdown (layers, heads, dimensions) is precisely defined, allowing for independent verification of the parameter count based on the provided configuration files.

Training Compute

6.0 / 10

TII has disclosed that the 7B model was trained using 1024 H100 GPU chips (some sources mention 2048). While the hardware type and scale are provided, the exact training duration in hours and the total carbon footprint or energy consumption metrics are not explicitly detailed in the current release documentation. This provides a moderate level of transparency compared to models that withhold all compute data, but falls short of exemplary reporting.

Benchmark Reproducibility

6.5 / 10

TII provides benchmark results on standard sets like MMLU, MMLU-Pro, ARC, and GSM8K. They specify the use of the `lm-evaluation-harness` and provide raw scores. However, while they mention the evaluation methodology (e.g., 5-shot), the exact evaluation code and specific prompt templates used to achieve the reported scores are not bundled in a single reproducible repository, requiring users to rely on the general harness settings.

Identity Consistency

9.0 / 10

The model consistently identifies itself as Falcon 3 in its system prompts and documentation. It does not exhibit the identity confusion seen in some fine-tuned models that claim to be GPT-4 or Llama. Versioning is clear within the 'Falcon 3' family, distinguishing between Base, Instruct, and Mamba variants.

Downstream

20.5 / 30

License Clarity

7.5 / 10

The model is released under the 'TII Falcon-LLM License 2.0'. This is a custom license based on Apache 2.0 but includes specific 'Acceptable Use' restrictions and requirements for derivative works. While the terms are publicly available and clearly written, it is not a standard OSI-approved open-source license, which introduces some complexity for commercial users compared to a pure Apache 2.0 or MIT license.

Hardware Footprint

8.0 / 10

VRAM requirements are well-documented for various precisions. Official and third-party sources (like Ollama and Hugging Face) provide clear guidance: ~15GB for FP16 and ~4.6GB for 4-bit quantization. TII also explicitly supports and documents various quantization formats (GGUF, AWQ, GPTQ) and their impact on deployment, making the hardware footprint highly predictable for end-users.

Versioning Drift

5.0 / 10

Falcon 3 uses a family-based naming convention (1B, 3B, 7B, 10B) and distinguishes between Base and Instruct versions. However, there is no formal semantic versioning (e.g., v3.1.0) or a detailed public changelog for minor weight updates or 'silent' refreshes. Users must rely on Hugging Face commit histories to track changes, which lacks the transparency of a formal release management system.

Resources

Official Documentation Release Notes Download Weights Source Code

About Falcon 3

The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.

Falcon3-7B

System Requirements

Architecture Diagram

Evaluation Benchmarks

Rankings

About Falcon3-7B

Technical Specifications

Model Integrity

Falcon3-7B Model Integrity Report

Audit Note

Upstream

Model

Downstream

Resources

About Falcon 3

Other Falcon 3 Models