ApX logoApX logo

Falcon3-10B

Parameters

10B

Context Length

33K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

Nov 2024

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

40

Key-Value Heads

10

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

1,000,042

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

5,120

Number of Layers

40

FFN Intermediate Size (Dense)

23,040

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

131,072

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 5.1k · Context: 33K · Vocab: 131.1kx 40 layersRMSNormPre-AttentionGrouped-Query Attention40Q / 10KV headsHead dim: 256+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 23k+Final RMSNormOutput Logits

Falcon3-10B

The Falcon3-10B is a member of the Falcon3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant is designed to advance capabilities in scientific reasoning, mathematics, and code generation. It is available in both base and instruction-tuned versions, facilitating diverse applications from general text generation to conversational AI. The model operates efficiently on various infrastructures, including resource-limited devices like laptops, due to its design considerations and optimized quantized versions.

Architecturally, Falcon3-10B is a Transformer-based causal decoder-only model featuring 40 decoder blocks, which define its deep structure. A key innovation in its attention mechanism is the implementation of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, which contributes to faster inference. The model utilizes a wider head dimension of 256 and incorporates Rotary Position Embeddings (RoPE) to support extended context understanding. For non-linearity, it employs the SwiGLu activation function, and its normalization scheme relies on RMSNorm. These architectural choices aim to balance performance with computational efficiency.

The Falcon3-10B model was constructed through a process that included depth up-scaling from the Falcon3-7B-Base model, followed by continued pre-training on 2 trillion tokens of high-quality data. The training corpus for the broader Falcon3 family comprised 14 trillion tokens, encompassing web content, code, scientific, technological, engineering, and mathematics (STEM) data, as well as high-quality and multilingual datasets. This extensive training enables the model to handle a context length of up to 32,000 tokens, supporting detailed analysis of long inputs and coherent multi-turn interactions. It supports inference in multiple languages, including English, French, Spanish, and Portuguese.

About Falcon 3

The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.


Other Falcon 3 Models

Evaluation Benchmarks

No evaluation benchmarks for Falcon3-10B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

67 / 100

Falcon3-10B Model Integrity Report

Total Score

67

/ 100

B

Audit Note

Falcon3-10B exhibits strong transparency in its architectural specifications and hardware requirements, providing clear guidance for local deployment and integration. The model's identity and tokenizer details are well-documented and verifiable through public repositories. However, significant transparency gaps remain regarding the specific composition of its 14-trillion-token training set and the total environmental impact of its compute-intensive training process.

Upstream

20.5 / 30

Architectural Provenance

7.5 / 10

The Falcon3-10B architecture is explicitly documented as a transformer-based causal decoder-only model with 40 decoder blocks. TII provides clear details on its provenance, stating it was created via depth up-scaling from the Falcon3-7B-Base model followed by continued pre-training. Key architectural modifications are disclosed, including Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, a 256-head dimension, SwiGLU activation, and RMSNorm. While the high-level methodology is clear, the specific 'redundant layers' chosen for duplication during up-scaling are not individually identified in public documentation.

Dataset Composition

4.5 / 10

TII discloses that the model was trained on 2 trillion tokens for the 10B variant (part of a larger 14 trillion token pool for the family). General categories are named: web content, code, STEM data, and multilingual datasets (English, French, Spanish, Portuguese). However, specific percentage breakdowns (e.g., web: 40%, code: 20%) are absent. While 'curated high-quality' data is mentioned, the exact filtering and cleaning methodologies are described in marketing terms rather than technical specifics, and no sample data or specific source lists are provided.

Tokenizer Integrity

8.5 / 10

The tokenizer is publicly accessible via Hugging Face and integrated into the 'transformers' library. It features a vocabulary size of 131,072 tokens, which is a significant increase from previous Falcon versions. The tokenization approach is documented as supporting the claimed four languages (EN, FR, ES, PT), and the vocabulary size is consistently reported across official model cards and technical blog posts. The tokenizer files are available for inspection and verification.

Model

26.0 / 40

Parameter Density

7.0 / 10

The model is a dense architecture with 10 billion parameters (often cited as 10.3B in technical specs). Since it is not a Mixture-of-Experts (MoE) model, the active parameters equal the total parameters. TII provides a clear architectural breakdown including the number of layers (40), hidden dimension (5120), and attention head configurations. The impact of quantization is partially documented through the release of official GPTQ and GGUF versions, though a detailed parameter-by-parameter density map is not public.

Training Compute

4.0 / 10

TII discloses the hardware used (1024 H100 GPU chips) for the pre-training phase. However, the total training duration in hours or days is not explicitly stated for the 10B variant's specific up-scaling and continued training phase. Furthermore, no official carbon footprint calculations or specific energy consumption metrics are provided in the model cards or the initial technical announcement. Cost estimates are also missing from official sources.

Benchmark Reproducibility

6.0 / 10

TII provides results for standard benchmarks (IFEval, BBH, MATH, MMLU-Pro) and specifies that they use the 'lm-evaluation-harness' for internal testing. While they report raw scores and mention few-shot settings (e.g., 3-shot for BBH), the exact prompts and full evaluation code are not bundled in a single reproducible repository. Third-party verification is available via the Open LLM Leaderboard, which provides some level of independent validation, but the lack of a comprehensive technical report with full prompt disclosure limits perfect reproducibility.

Identity Consistency

9.0 / 10

The model consistently identifies itself as a TII Falcon model in its system prompts and documentation. It correctly identifies its version (Falcon 3) and its origin (Technology Innovation Institute). There are no documented cases of the model claiming to be a competitor's product (like GPT-4 or Llama). It maintains a coherent identity across its base and instruct variants.

Downstream

20.0 / 30

License Clarity

7.0 / 10

The model is released under the 'TII Falcon-LLM License 2.0'. This is a custom license based on Apache 2.0 but includes specific 'Acceptable Use Policy' restrictions and requirements for attribution. While the terms for commercial use are generally permissive, the license is not a standard OSI-approved open-source license, and the 'Acceptable Use' terms add a layer of legal complexity that requires careful review compared to a pure MIT or Apache 2.0 license.

Hardware Footprint

8.0 / 10

Hardware requirements are well-documented. TII and partners provide VRAM estimates for various quantization levels (FP16, INT8, INT4). For example, FP16 is noted to require ~22GB VRAM, while 4-bit quants are shown to fit within ~6-7GB. Context length scaling is also addressed, noting the 32K context window and its memory implications. The availability of official GGUF and GPTQ versions with associated size data provides high transparency for deployment.

Versioning Drift

5.0 / 10

The model follows a clear family versioning (Falcon 1, 2, 3), and the release date is well-defined (Dec 17, 2024). However, there is no public, granular changelog for minor weight updates or iterative 'silent' improvements. While the model is hosted on Hugging Face with commit history, there is no formal semantic versioning for the weights themselves (e.g., v3.0.1) that documents specific behavioral changes or safety tuning adjustments over time.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs