Parameters
10B
Context Length
32.768K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
Nov 2024
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
5120
Number of Layers
40
Attention Heads
40
Key-Value Heads
10
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
The Falcon3-10B is a member of the Falcon3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant is designed to advance capabilities in scientific reasoning, mathematics, and code generation. It is available in both base and instruction-tuned versions, facilitating diverse applications from general text generation to conversational AI. The model operates efficiently on various infrastructures, including resource-limited devices like laptops, due to its design considerations and optimized quantized versions.
Architecturally, Falcon3-10B is a Transformer-based causal decoder-only model featuring 40 decoder blocks, which define its deep structure. A key innovation in its attention mechanism is the implementation of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, which contributes to faster inference. The model utilizes a wider head dimension of 256 and incorporates Rotary Position Embeddings (RoPE) to support extended context understanding. For non-linearity, it employs the SwiGLu activation function, and its normalization scheme relies on RMSNorm. These architectural choices aim to balance performance with computational efficiency.
The Falcon3-10B model was constructed through a process that included depth up-scaling from the Falcon3-7B-Base model, followed by continued pre-training on 2 trillion tokens of high-quality data. The training corpus for the broader Falcon3 family comprised 14 trillion tokens, encompassing web content, code, scientific, technological, engineering, and mathematics (STEM) data, as well as high-quality and multilingual datasets. This extensive training enables the model to handle a context length of up to 32,000 tokens, supporting detailed analysis of long inputs and coherent multi-turn interactions. It supports inference in multiple languages, including English, French, Spanish, and Portuguese.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
Ranking is for Local LLMs.
No evaluation benchmarks for Falcon3-10B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens