Parameters
3B
Context Length
32.768K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
-
Attention Structure
Multi-Query Attention
Hidden Dimension Size
1536
Number of Layers
32
Attention Heads
48
Key-Value Heads
1
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
Falcon-3B is a member of the Falcon 3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant, with 3 billion parameters, is engineered for efficient deployment on various hardware, including systems with limited resources such as laptops and single GPUs. Its primary purpose is to deliver robust performance across a spectrum of natural language processing tasks, focusing on reasoning, language understanding, instruction following, code generation, and mathematics. The Falcon-3B model also supports multilingual capabilities, specifically English, French, Spanish, and Portuguese.
The architectural foundation of Falcon-3B is a transformer-based causal decoder-only design. It incorporates several innovations to enhance efficiency and performance. Notably, it utilizes Grouped Query Attention (GQA), a mechanism that optimizes inference speed and reduces Key-Value (KV) cache memory consumption by sharing parameters among attention heads. The model employs SwiGLU as its activation function and RMSNorm for normalization, contributing to stable and effective learning. Positional embeddings are handled using Rotary Positional Embeddings (RoPE) to support extended context comprehension. Furthermore, the model leverages FlashAttention 2 for accelerated attention computations and features a high vocabulary size of 131,000 tokens, enabling improved compression and downstream performance.
Falcon-3B, along with its instruction-tuned counterpart, has been developed using techniques such as pruning and knowledge distillation from the larger Falcon3-7B-Base model, resulting in an efficient and performant compact model. The base variant supports a context length of 8,000 tokens, while the instruction-tuned variant extends this capability to 32,000 tokens, allowing it to process and generate responses for longer and more complex inputs. This design paradigm makes Falcon-3B a suitable choice for applications requiring advanced AI functionalities in environments where computational resources are a consideration.
The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.
Ranking is for Local LLMs.
No evaluation benchmarks for Falcon-3B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens