Parameters
3B
Context Length
32.768K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
24
Key-Value Heads
6
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
1,000,042
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
1,536
Number of Layers
28
FFN Intermediate Size (Dense)
9,216
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
131,072
The Falcon3-3B model is part of the Falcon 3 family of open foundation models developed by the Technology Innovation Institute (TII). This model is designed for a balance of performance and efficiency, enabling its deployment on a range of computing infrastructures, including smaller devices. It is developed to support advancements in capabilities related to science, mathematics, and code generation. The Falcon 3 series includes both base models for general-purpose generative tasks and instruct models for conversational applications, emphasizing accessibility in advanced artificial intelligence systems.
Architecturally, Falcon3-3B employs a transformer-based causal decoder-only design. It incorporates 22 decoder blocks, contributing to its processing depth. For attention mechanisms, the model utilizes Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, along with a wider head dimension of 256. This configuration supports efficient inference operations. The model integrates SwiGLU as its activation function and RMSNorm for normalization, in addition to using Rotary Position Embeddings (RoPE) with a high value to handle extended context. It also leverages Flash Attention 2 for optimized memory and speed during operations.
The Falcon3-3B model, particularly its instruct variant, supports a context length of up to 32,768 tokens, while the base version supports 8,192 tokens. It is engineered to perform on tasks such as reasoning, language understanding, instruction following, and mathematical problem-solving. The model has been trained to support four languages: English, French, Spanish, and Portuguese. Its design considerations include the availability of quantized versions, such as int4, int8, and 1.58 Bitnet, which further enhance its efficiency and suitability for resource-constrained environments.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
No evaluation benchmarks for Falcon3-3B available.
Overall Rank
-
Coding Rank
-
Total Score
71
/ 100
Falcon3-3B exhibits strong transparency regarding its architectural design and hardware requirements, providing detailed specifications and accessible weights. Its primary transparency weaknesses lie in the lack of a comprehensive technical paper detailing dataset proportions and specific compute costs. While the model is highly verifiable in its structure and licensing, more granular disclosure of training data sources and evaluation prompts would be required for an exemplary rating.
Architectural Provenance
The model is explicitly identified as a transformer-based causal decoder-only architecture. TII provides specific details on its derivation, noting it was pruned and 'healed' from the larger Falcon3-7B-Base model using knowledge distillation. Architectural specifics are well-documented, including the use of 22 decoder blocks, Grouped Query Attention (GQA) with 12 query and 4 KV heads, SwiGLU activation, RMSNorm, and Rotary Position Embeddings (RoPE) with a specific high value (1000042) for context handling.
Dataset Composition
While TII discloses the total token count for the Falcon 3 family (14 trillion) and the specific amount used for 'healing' the 3B variant (100 billion tokens), the breakdown of the dataset is only described in general categories: web, code, STEM, and high-quality multilingual data. Specific proportions, source names beyond the legacy 'RefinedWeb', and detailed filtering/cleaning methodologies for this specific version are not publicly detailed in a technical paper.
Tokenizer Integrity
The tokenizer is publicly accessible via the Hugging Face repository and is well-documented. It features a vocabulary size of 131,072 tokens, which is a significant increase from previous versions. The approach is consistent with the claimed support for English, French, Spanish, and Portuguese, and the tokenizer files (tokenizer.json, tokenizer_config.json) are available for direct inspection and verification.
Parameter Density
The model's parameter count is clearly stated as 3 billion. As a dense model, all parameters are active during inference. TII provides a detailed architectural breakdown including the number of layers (22), head dimensions (256), and attention configurations (GQA), which allows for precise verification of the parameter density claims.
Training Compute
TII discloses the hardware used (1024 H100 GPU chips) for the training process. However, specific GPU-hours for the 3B variant's distillation and healing phase are not explicitly provided, nor is there a detailed carbon footprint calculation or energy consumption report specific to this model's development cycle.
Benchmark Reproducibility
TII provides benchmark results on standard sets like MMLU-Pro, MATH, and IFEval. They specify the use of the 'lm-evaluation-harness' framework. However, while they mention 'internal pipeline' settings, the exact prompts and few-shot configurations are not fully documented in a public technical report, leading to some community noted discrepancies when compared to standard leaderboard evaluations.
Identity Consistency
The model demonstrates high identity consistency, correctly identifying itself as part of the Falcon 3 family in its system prompts and documentation. There is no evidence of the model claiming to be a competitor's product (e.g., GPT-4), and it maintains a clear versioning identity across its base and instruct variants.
License Clarity
The model is released under the 'TII Falcon-LLM License 2.0'. This is a custom license based on Apache 2.0 but includes specific requirements, such as mandatory attribution for derivative works ('built using AI technology from TII'). While the terms are legally clear and allow for commercial use, it is not a standard OSI-approved license, which adds a layer of complexity for users.
Hardware Footprint
Hardware requirements are well-documented, with specific VRAM estimates provided for various quantization levels (FP16, INT8, INT4). For example, FP16 is noted to require approximately 7-8GB of VRAM. The availability of official quantized versions (GGUF, AWQ, GPTQ) and documentation on their impact on memory makes the hardware footprint highly transparent.
Versioning Drift
The model follows a clear release versioning (Falcon 3 series), but a detailed, granular changelog for weight updates or specific 'drift' documentation is lacking. While the release date and initial version are clear, there is no established public system for tracking silent updates or performance changes over time beyond the initial Hugging Face commit history.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online