Parameters
1B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
16
Key-Value Heads
4
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
1,000,042
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
768
Number of Layers
18
FFN Intermediate Size (Dense)
8,192
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
131,072
The Falcon3-1B model is a member of the Falcon 3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This family of models emphasizes enhancing capabilities in scientific, mathematical, and coding domains, while maintaining a strong focus on training efficiency. The Falcon3-1B variant is specifically engineered to operate effectively on lightweight computational infrastructures, including devices such as laptops, thereby broadening the accessibility of advanced AI capabilities. It supports multilingual applications, including English, French, Spanish, and Portuguese.
Architecturally, Falcon3-1B is built upon a Transformer-based causal decoder-only design, incorporating 18 decoder blocks. The model utilizes Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads, which contributes to efficient inference by minimizing memory consumption for the Key-Value (KV) cache. For activation, the model employs SwiGLU, and for normalization, it integrates RMSNorm. Positional embeddings are handled via Rotary Position Embeddings (RoPE), facilitating effective long-context understanding. The tokenizer for Falcon3-1B supports an extensive vocabulary of 131,000 tokens, which aids in data compression and downstream performance. Furthermore, the architecture incorporates Flash Attention 2 for optimized computational throughput.
Falcon3-1B is designed for a variety of natural language processing tasks, including but not limited to reasoning, language comprehension, instruction following, code generation, and mathematical problem-solving. Its design allows for its deployment in generative AI applications and conversational AI systems. The model's efficiency and optimized variants, such as quantized versions, enable its use in environments with constrained resources, providing a practical solution for diverse real-world applications.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
No evaluation benchmarks for Falcon3-1B available.
Overall Rank
-
Coding Rank
-
Total Score
68
/ 100
Falcon3-1B exhibits strong transparency in its architectural design and hardware requirements, providing developers with clear specifications for local deployment. However, the model's data provenance remains somewhat opaque, relying on general category descriptions rather than detailed source disclosures. While technically accessible, the custom licensing terms and lack of reproducible evaluation artifacts represent significant hurdles for fully transparent auditing.
Architectural Provenance
The Falcon3-1B architecture is comprehensively documented as a causal decoder-only Transformer with 18 layers. TII explicitly details the use of Grouped Query Attention (GQA) with 8 query heads and 4 key-value heads, SwiGLU activation, RMSNorm, and Rotary Position Embeddings (RoPE). The model's provenance is clearly linked to a 'pruning and healing' methodology derived from larger Falcon 3 models (3B and 7B), which is a significant level of methodological disclosure for a 1B variant.
Dataset Composition
TII provides a high-level breakdown of the training data, stating it was 'healed' on 80 billion tokens consisting of web, code, STEM, and multilingual data. While it mentions the RefinedWeb dataset as a primary source for the family, the specific proportions for the 1B variant's 80GT healing set are not disclosed. The instruct version mentions 1.2 million samples for post-training, but detailed source lists or data filtering scripts for this specific variant are absent.
Tokenizer Integrity
The tokenizer is publicly available via Hugging Face and is fully integrated into the 'transformers' library. It features a large vocabulary of 131,072 tokens, supporting English, French, Spanish, and Portuguese. Technical details such as the use of Byte Pair Encoding (BPE) and specific special tokens are well-documented in the model card and configuration files.
Parameter Density
The model is a dense architecture with approximately 1B parameters (specifically cited as 1.67B total in some technical manifests like Ollama, though marketed as 1B). TII provides a detailed architectural breakdown including layer counts (18), head dimensions (256), and attention configurations, which allows for precise verification of parameter distribution across the model.
Training Compute
TII discloses that the 1B model was pruned and healed using 256 H100 GPU chips. However, the total training duration (GPU hours) and the associated carbon footprint or energy consumption for this specific variant are not explicitly provided. While hardware types are named, the lack of duration or environmental impact metrics prevents a higher score.
Benchmark Reproducibility
While TII provides a table of internal benchmark results (MMLU, GSM8K, etc.) and mentions using the 'lm-evaluation-harness', they do not release the specific evaluation code, exact prompts, or few-shot examples used to achieve these scores. This limits third-party ability to replicate the exact reported figures. (Score adjusted for discovered external research indicating potential contamination risks in the model family).
Identity Consistency
The model consistently identifies as part of the Falcon 3 family from TII in its system prompts and documentation. It maintains a clear versioning distinction between 'Base' and 'Instruct' variants. There is no evidence of the model claiming to be a competitor's product or misrepresenting its fundamental nature as an AI.
License Clarity
The model is released under the 'TII Falcon-LLM License 2.0'. While the license is publicly accessible and based on Apache 2.0, it includes custom clauses and an Acceptable Use Policy. There is significant community debate regarding its 'open source' status due to commercial restrictions (royalty obligations for high-revenue entities), which creates ambiguity compared to standard OSI-approved licenses.
Hardware Footprint
Hardware requirements are exceptionally well-documented. TII and community partners provide specific VRAM requirements for various quantization levels (FP16, INT8, INT4) and context lengths. The model's efficiency on consumer hardware like laptops is a primary focus of its documentation, with clear guidance on deployment via tools like llama.cpp and Ollama.
Versioning Drift
The model uses a clear naming convention (Falcon3-1B-Instruct), but a formal, public changelog tracking silent updates or weight drifts is not readily available. While the release date is clear (December 2024), there is no established infrastructure for users to track ongoing updates or access specific historical snapshots beyond the initial release.
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
APX AI
Online