Parameters
1B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
-
Attention
Attention Structure
Multi-Query Attention
Attention Heads
32
Key-Value Heads
1
Attention Head Dimension
-
Position Embedding
ROPE
RoPE Theta
-
Sliding Window Attention
-
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
768
Number of Layers
24
FFN Intermediate Size (Dense)
-
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
-
The Falcon3-1B model, developed by the Technology Innovation Institute (TII), is a member of the Falcon3 family of open foundation models, designed for efficient operation with a parameter count around 1 billion. This model aims to advance capabilities in scientific reasoning, mathematical problem-solving, and code understanding. Variants such as Falcon3-1B-Base provide a raw, pretrained foundation suitable for subsequent fine-tuning across diverse natural language processing applications, while Falcon3-1B-Instruct is further optimized for conversational interfaces and adherence to explicit instructions.
Architecturally, Falcon3-1B is a causal decoder-only Transformer. It incorporates 18 decoder blocks, a design choice contributing to its efficiency. A key innovation within its architecture is the implementation of Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads. This GQA structure is engineered to enhance inference speed and reduce memory consumption. The model also employs a wider head dimension of 256 and utilizes Rotary Position Embedding (RoPE) to facilitate long context understanding.
The activation function used throughout the network is SwiGLU, combined with RMSNorm for normalization, contributing to stable training and performance. The model's design focuses on enabling robust language understanding and generation across multiple languages, including English, French, Spanish, and Portuguese. Its optimized architecture and relatively compact parameter size make it a candidate for deployment in environments with limited computational resources, such as edge devices, while still delivering strong performance for a range of language-based tasks.
The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.
No evaluation benchmarks for Falcon-1B available.
Overall Rank
-
Coding Rank
-
Total Score
67
/ 100
Falcon3-1B-Instruct demonstrates a strong commitment to architectural transparency, providing clear details on its pruning-based origin and specific Transformer configurations. While the model's identity and basic technical specs are well-documented, it suffers from significant opacity regarding the specific composition of its training datasets and the total compute resources consumed. The use of a custom license and the lack of detailed reproduction code for benchmarks further limit its transparency profile to a moderate level.
Architectural Provenance
The Falcon3-1B-Instruct model is explicitly documented as a causal decoder-only Transformer with 18 decoder blocks. TII provides specific technical details including the use of Grouped Query Attention (GQA) with 8 query heads and 4 key-value heads, a head dimension of 256, SwiGLU activation, and RMSNorm. Crucially, the model's provenance is described as being 'pruned and healed' from a larger 3B Falcon model using knowledge distillation, which is a more transparent disclosure of origin than many 'trained from scratch' claims. However, the full pretraining procedure for the parent 3B model is only partially detailed in the general Falcon 3 blog post.
Dataset Composition
TII discloses that the model was trained on 80 Gigatokens of data for the 'healing' phase and the instruct version was post-trained on 1.2 million samples. General categories are provided (web, code, STEM, multilingual), but specific percentage breakdowns of the 80GT or the 14T tokens used for the base 7B model are not publicly available. The data collection and filtering methodologies are mentioned in marketing terms ('curated', 'high-quality') without technical documentation or public access to the underlying datasets.
Tokenizer Integrity
The tokenizer is publicly accessible via Hugging Face and the vocabulary size is clearly stated as 131,072 tokens. It is documented as a BPE-based tokenizer with support for English, French, Spanish, and Portuguese. The alignment with the claimed language support is verifiable through the provided configuration files and the 'transformers' library integration, though detailed training data for the tokenizer itself is not explicitly separated from the general pretraining data.
Parameter Density
The model is a dense architecture with a clearly stated parameter count of approximately 1 billion. Unlike MoE models, there is no ambiguity between active and total parameters. The architectural breakdown (18 layers, specific head counts) is provided in the model card, allowing for a precise understanding of parameter distribution across the network.
Training Compute
TII discloses the hardware used (256 H100 GPU chips) for the pruning and healing phase of the 1B model. However, the total GPU hours, training duration, and specific energy consumption or carbon footprint for the 1B variant are not provided. While the 7B base model's compute is mentioned (1024 H100s for 14T tokens), the lack of specific metrics for the 1B-Instruct variant's post-training and healing phases leaves significant gaps.
Benchmark Reproducibility
Benchmark results are reported for standard sets like MMLU, ARC, and GSM8K. TII specifies the use of the 'lm-evaluation-harness' and mentions the use of chat templates and few-shot settings. However, the exact prompts, few-shot examples, and specific evaluation code for reproducing the reported 'raw scores' are not fully public, and third-party verification is limited to leaderboard entries rather than independent audits.
Identity Consistency
The model consistently identifies itself as Falcon3 from TII in its system prompts and documentation. There is clear version tracking within the Falcon 3 family (1B, 3B, 7B, 10B) and a distinction between Base and Instruct variants. The model does not exhibit identity confusion with competitors like Llama or GPT in official documentation or standard testing environments.
License Clarity
The model is released under the 'TII Falcon-LLM License 2.0'. While based on Apache 2.0, it includes a custom 'Acceptable Use Policy' and specific terms that are not standard open-source. The license allows for commercial use but includes restrictions and requirements (such as attribution and compliance with the AUP) that make it a 'weights-available' license rather than a pure open-source license, leading to some ambiguity for enterprise users.
Hardware Footprint
VRAM requirements are well-documented by the community and partially by TII through the release of GGUF, AWQ, and GPTQ variants. The model card specifies the 8K context length for the 1B variant, and memory scaling for this context is predictable. However, official documentation lacks a comprehensive table of VRAM vs. batch size vs. quantization levels, relying instead on third-party implementations like Ollama and llama.cpp for this data.
Versioning Drift
The model has a clear release date (December 2024) and is part of a numbered family. However, there is no public changelog or semantic versioning for the weights themselves (e.g., v1.0 vs v1.1). While the initial release is well-documented, the infrastructure for tracking future silent updates or performance drift is not explicitly presented to the public.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online