ApX logo

Falcon3-3B

Parameters

3B

Context Length

32.768K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

-

Technical Specifications

Attention Structure

Grouped-Query Attention

Hidden Dimension Size

1536

Number of Layers

28

Attention Heads

24

Key-Value Heads

6

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Falcon3-3B

The Falcon3-3B model is part of the Falcon 3 family of open foundation models developed by the Technology Innovation Institute (TII). This model is designed for a balance of performance and efficiency, enabling its deployment on a range of computing infrastructures, including smaller devices. It is developed to support advancements in capabilities related to science, mathematics, and code generation. The Falcon 3 series includes both base models for general-purpose generative tasks and instruct models for conversational applications, emphasizing accessibility in advanced artificial intelligence systems.

Architecturally, Falcon3-3B employs a transformer-based causal decoder-only design. It incorporates 22 decoder blocks, contributing to its processing depth. For attention mechanisms, the model utilizes Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, along with a wider head dimension of 256. This configuration supports efficient inference operations. The model integrates SwiGLU as its activation function and RMSNorm for normalization, in addition to using Rotary Position Embeddings (RoPE) with a high value to handle extended context. It also leverages Flash Attention 2 for optimized memory and speed during operations.

The Falcon3-3B model, particularly its instruct variant, supports a context length of up to 32,768 tokens, while the base version supports 8,192 tokens. It is engineered to perform on tasks such as reasoning, language understanding, instruction following, and mathematical problem-solving. The model has been trained to support four languages: English, French, Spanish, and Portuguese. Its design considerations include the availability of quantized versions, such as int4, int8, and 1.58 Bitnet, which further enhance its efficiency and suitability for resource-constrained environments.

About Falcon 3

The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.


Other Falcon 3 Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Falcon3-3B available.

Rankings

Overall Rank

-

Coding Rank

-

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
16k
32k

VRAM Required:

Recommended GPUs

Falcon3-3B: Specifications and GPU VRAM Requirements