ApX logo

Falcon3-10B

Parameters

10B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

Nov 2024

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

5120

Number of Layers

40

Attention Heads

40

Key-Value Heads

10

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Falcon3-10B

The Falcon3-10B is a member of the Falcon3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant is designed to advance capabilities in scientific reasoning, mathematics, and code generation. It is available in both base and instruction-tuned versions, facilitating diverse applications from general text generation to conversational AI. The model operates efficiently on various infrastructures, including resource-limited devices like laptops, due to its design considerations and optimized quantized versions.

Architecturally, Falcon3-10B is a Transformer-based causal decoder-only model featuring 40 decoder blocks, which define its deep structure. A key innovation in its attention mechanism is the implementation of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, which contributes to faster inference. The model utilizes a wider head dimension of 256 and incorporates Rotary Position Embeddings (RoPE) to support extended context understanding. For non-linearity, it employs the SwiGLu activation function, and its normalization scheme relies on RMSNorm. These architectural choices aim to balance performance with computational efficiency.

The Falcon3-10B model was constructed through a process that included depth up-scaling from the Falcon3-7B-Base model, followed by continued pre-training on 2 trillion tokens of high-quality data. The training corpus for the broader Falcon3 family comprised 14 trillion tokens, encompassing web content, code, scientific, technological, engineering, and mathematics (STEM) data, as well as high-quality and multilingual datasets. This extensive training enables the model to handle a context length of up to 32,000 tokens, supporting detailed analysis of long inputs and coherent multi-turn interactions. It supports inference in multiple languages, including English, French, Spanish, and Portuguese.

About Falcon 3

The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.


Other Falcon 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Falcon3-10B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs