Parameters
7B
Context Length
32K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
32
Key-Value Heads
8
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
1,000,042
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
36
FFN Intermediate Size (Dense)
23,040
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
131,072
Falcon 3-7B is a state-of-the-art instruction-tuned language model developed by the Technology Innovation Institute (TII). This model variant is a component of the Falcon 3 family, which focuses on enhancing capabilities in scientific domains, mathematics, and code generation. It is engineered for efficiency and scalability, enabling deployment on a range of infrastructures, including those with limited computational resources. The model supports multilingual applications, with training encompassing English, French, Spanish, and Portuguese, and is designed to handle long-context tasks.
The architectural foundation of Falcon 3-7B is a transformer-based causal decoder-only design, incorporating 28 decoder blocks. It utilizes Grouped Query Attention (GQA) to optimize inference speed and memory efficiency, configured with 12 query heads and 4 key-value heads, and a head dimension of 256. The model integrates Rotary Positional Embedding (RoPE) with a high value of 1000042 to facilitate effective understanding and processing of extended contexts up to 32,000 tokens. Activation functions are implemented using SwiGLU, complemented by RMSNorm for normalization, contributing to training stability and efficiency. It is also optimized to utilize FlashAttention-3.
Falcon 3-7B was pretrained on a dataset comprising 14 teratokens of diverse web, code, scientific, and high-quality multilingual data. Following pretraining, it underwent further fine-tuning on 1.2 million samples, specifically tailored for STEM content, conversational interactions, code, and safety compliance. This comprehensive training regimen positions the model for robust performance across various applications, including scientific and mathematical problem-solving, multilingual content generation, and processing long-form textual information. Its design supports instruction-following tasks, making it suitable for educational tools, research assistance, and the generation of technical documentation.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
No evaluation benchmarks for Falcon3-7B available.
Overall Rank
-
Coding Rank
-
Total Score
73
/ 100
Falcon 3-7B exhibits strong transparency regarding its technical architecture and hardware requirements, providing developers with the necessary specifications for efficient deployment. While it offers clear identity and a public license, it lacks granular detail in its dataset composition and formal versioning history. The model represents a significant step forward in open-weight documentation for the Falcon family, though a full technical report is still required to reach exemplary transparency levels.
Architectural Provenance
Falcon 3-7B is explicitly documented as a transformer-based causal decoder-only model with 28 decoder blocks. TII provides specific architectural details including the use of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, a head dimension of 256, and SwiGLU activation with RMSNorm. The use of Rotary Positional Embedding (RoPE) with a specific base (1000042) to support its 32K context window is well-documented in official model cards and technical blogs. While a full peer-reviewed paper for the Falcon 3 iteration is pending (announced for early 2025), the technical specifications are highly detailed and publicly accessible via Hugging Face and TII's official portal.
Dataset Composition
TII discloses that the model was pretrained on 14 trillion tokens and post-trained on 1.2 million samples. General categories are provided: web data, code, STEM, and multilingual content (English, French, Spanish, Portuguese). However, specific percentage breakdowns of these sources are not publicly available. While TII has a history of releasing data extracts (e.g., RefinedWeb for Falcon 1), the specific composition and filtering methodology for the 14T token Falcon 3 dataset remain described in general terms rather than granular detail.
Tokenizer Integrity
The tokenizer is fully accessible via the Hugging Face repository and integrated into the `transformers` library. It features a clearly stated vocabulary size of 131,072 tokens, which is a significant increase from previous versions to improve compression across its supported languages. The tokenization approach is verifiable through the public `tokenizer_config.json` and `tokenizer.json` files, which confirm the claimed multilingual support and special token handling (e.g., chat templates).
Parameter Density
The model is a dense architecture with approximately 7.46 billion total parameters. Unlike MoE models, all parameters are active during inference, which is clearly reflected in the documentation and model files. The architectural breakdown (layers, heads, dimensions) is precisely defined, allowing for independent verification of the parameter count based on the provided configuration files.
Training Compute
TII has disclosed that the 7B model was trained using 1024 H100 GPU chips (some sources mention 2048). While the hardware type and scale are provided, the exact training duration in hours and the total carbon footprint or energy consumption metrics are not explicitly detailed in the current release documentation. This provides a moderate level of transparency compared to models that withhold all compute data, but falls short of exemplary reporting.
Benchmark Reproducibility
TII provides benchmark results on standard sets like MMLU, MMLU-Pro, ARC, and GSM8K. They specify the use of the `lm-evaluation-harness` and provide raw scores. However, while they mention the evaluation methodology (e.g., 5-shot), the exact evaluation code and specific prompt templates used to achieve the reported scores are not bundled in a single reproducible repository, requiring users to rely on the general harness settings.
Identity Consistency
The model consistently identifies itself as Falcon 3 in its system prompts and documentation. It does not exhibit the identity confusion seen in some fine-tuned models that claim to be GPT-4 or Llama. Versioning is clear within the 'Falcon 3' family, distinguishing between Base, Instruct, and Mamba variants.
License Clarity
The model is released under the 'TII Falcon-LLM License 2.0'. This is a custom license based on Apache 2.0 but includes specific 'Acceptable Use' restrictions and requirements for derivative works. While the terms are publicly available and clearly written, it is not a standard OSI-approved open-source license, which introduces some complexity for commercial users compared to a pure Apache 2.0 or MIT license.
Hardware Footprint
VRAM requirements are well-documented for various precisions. Official and third-party sources (like Ollama and Hugging Face) provide clear guidance: ~15GB for FP16 and ~4.6GB for 4-bit quantization. TII also explicitly supports and documents various quantization formats (GGUF, AWQ, GPTQ) and their impact on deployment, making the hardware footprint highly predictable for end-users.
Versioning Drift
Falcon 3 uses a family-based naming convention (1B, 3B, 7B, 10B) and distinguishes between Base and Instruct versions. However, there is no formal semantic versioning (e.g., v3.1.0) or a detailed public changelog for minor weight updates or 'silent' refreshes. Users must rely on Hugging Face commit histories to track changes, which lacks the transparency of a formal release management system.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online