Parameters
10B
Context Length
33K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
Nov 2024
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
40
Key-Value Heads
10
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
1,000,042
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
5,120
Number of Layers
40
FFN Intermediate Size (Dense)
23,040
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
131,072
The Falcon3-10B is a member of the Falcon3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant is designed to advance capabilities in scientific reasoning, mathematics, and code generation. It is available in both base and instruction-tuned versions, facilitating diverse applications from general text generation to conversational AI. The model operates efficiently on various infrastructures, including resource-limited devices like laptops, due to its design considerations and optimized quantized versions.
Architecturally, Falcon3-10B is a Transformer-based causal decoder-only model featuring 40 decoder blocks, which define its deep structure. A key innovation in its attention mechanism is the implementation of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, which contributes to faster inference. The model utilizes a wider head dimension of 256 and incorporates Rotary Position Embeddings (RoPE) to support extended context understanding. For non-linearity, it employs the SwiGLu activation function, and its normalization scheme relies on RMSNorm. These architectural choices aim to balance performance with computational efficiency.
The Falcon3-10B model was constructed through a process that included depth up-scaling from the Falcon3-7B-Base model, followed by continued pre-training on 2 trillion tokens of high-quality data. The training corpus for the broader Falcon3 family comprised 14 trillion tokens, encompassing web content, code, scientific, technological, engineering, and mathematics (STEM) data, as well as high-quality and multilingual datasets. This extensive training enables the model to handle a context length of up to 32,000 tokens, supporting detailed analysis of long inputs and coherent multi-turn interactions. It supports inference in multiple languages, including English, French, Spanish, and Portuguese.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
No evaluation benchmarks for Falcon3-10B available.
Overall Rank
-
Coding Rank
-
Total Score
67
/ 100
Falcon3-10B exhibits strong transparency in its architectural specifications and hardware requirements, providing clear guidance for local deployment and integration. The model's identity and tokenizer details are well-documented and verifiable through public repositories. However, significant transparency gaps remain regarding the specific composition of its 14-trillion-token training set and the total environmental impact of its compute-intensive training process.
Architectural Provenance
The Falcon3-10B architecture is explicitly documented as a transformer-based causal decoder-only model with 40 decoder blocks. TII provides clear details on its provenance, stating it was created via depth up-scaling from the Falcon3-7B-Base model followed by continued pre-training. Key architectural modifications are disclosed, including Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, a 256-head dimension, SwiGLU activation, and RMSNorm. While the high-level methodology is clear, the specific 'redundant layers' chosen for duplication during up-scaling are not individually identified in public documentation.
Dataset Composition
TII discloses that the model was trained on 2 trillion tokens for the 10B variant (part of a larger 14 trillion token pool for the family). General categories are named: web content, code, STEM data, and multilingual datasets (English, French, Spanish, Portuguese). However, specific percentage breakdowns (e.g., web: 40%, code: 20%) are absent. While 'curated high-quality' data is mentioned, the exact filtering and cleaning methodologies are described in marketing terms rather than technical specifics, and no sample data or specific source lists are provided.
Tokenizer Integrity
The tokenizer is publicly accessible via Hugging Face and integrated into the 'transformers' library. It features a vocabulary size of 131,072 tokens, which is a significant increase from previous Falcon versions. The tokenization approach is documented as supporting the claimed four languages (EN, FR, ES, PT), and the vocabulary size is consistently reported across official model cards and technical blog posts. The tokenizer files are available for inspection and verification.
Parameter Density
The model is a dense architecture with 10 billion parameters (often cited as 10.3B in technical specs). Since it is not a Mixture-of-Experts (MoE) model, the active parameters equal the total parameters. TII provides a clear architectural breakdown including the number of layers (40), hidden dimension (5120), and attention head configurations. The impact of quantization is partially documented through the release of official GPTQ and GGUF versions, though a detailed parameter-by-parameter density map is not public.
Training Compute
TII discloses the hardware used (1024 H100 GPU chips) for the pre-training phase. However, the total training duration in hours or days is not explicitly stated for the 10B variant's specific up-scaling and continued training phase. Furthermore, no official carbon footprint calculations or specific energy consumption metrics are provided in the model cards or the initial technical announcement. Cost estimates are also missing from official sources.
Benchmark Reproducibility
TII provides results for standard benchmarks (IFEval, BBH, MATH, MMLU-Pro) and specifies that they use the 'lm-evaluation-harness' for internal testing. While they report raw scores and mention few-shot settings (e.g., 3-shot for BBH), the exact prompts and full evaluation code are not bundled in a single reproducible repository. Third-party verification is available via the Open LLM Leaderboard, which provides some level of independent validation, but the lack of a comprehensive technical report with full prompt disclosure limits perfect reproducibility.
Identity Consistency
The model consistently identifies itself as a TII Falcon model in its system prompts and documentation. It correctly identifies its version (Falcon 3) and its origin (Technology Innovation Institute). There are no documented cases of the model claiming to be a competitor's product (like GPT-4 or Llama). It maintains a coherent identity across its base and instruct variants.
License Clarity
The model is released under the 'TII Falcon-LLM License 2.0'. This is a custom license based on Apache 2.0 but includes specific 'Acceptable Use Policy' restrictions and requirements for attribution. While the terms for commercial use are generally permissive, the license is not a standard OSI-approved open-source license, and the 'Acceptable Use' terms add a layer of legal complexity that requires careful review compared to a pure MIT or Apache 2.0 license.
Hardware Footprint
Hardware requirements are well-documented. TII and partners provide VRAM estimates for various quantization levels (FP16, INT8, INT4). For example, FP16 is noted to require ~22GB VRAM, while 4-bit quants are shown to fit within ~6-7GB. Context length scaling is also addressed, noting the 32K context window and its memory implications. The availability of official GGUF and GPTQ versions with associated size data provides high transparency for deployment.
Versioning Drift
The model follows a clear family versioning (Falcon 1, 2, 3), and the release date is well-defined (Dec 17, 2024). However, there is no public, granular changelog for minor weight updates or iterative 'silent' improvements. While the model is hosted on Hugging Face with commit history, there is no formal semantic versioning for the weights themselves (e.g., v3.0.1) that documents specific behavioral changes or safety tuning adjustments over time.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online