Parameters
1B
Context Length
8.192K
Modality
Text
Architecture
Dense
License
TII Falcon-LLM License 2.0
Release Date
17 Dec 2024
Knowledge Cutoff
-
Attention Structure
Grouped-Query Attention
Hidden Dimension Size
768
Number of Layers
18
Attention Heads
16
Key-Value Heads
4
Activation Function
SwigLU
Normalization
RMS Normalization
Position Embedding
ROPE
VRAM requirements for different quantization methods and context sizes
The Falcon3-1B model is a member of the Falcon 3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This family of models emphasizes enhancing capabilities in scientific, mathematical, and coding domains, while maintaining a strong focus on training efficiency. The Falcon3-1B variant is specifically engineered to operate effectively on lightweight computational infrastructures, including devices such as laptops, thereby broadening the accessibility of advanced AI capabilities. It supports multilingual applications, including English, French, Spanish, and Portuguese.
Architecturally, Falcon3-1B is built upon a Transformer-based causal decoder-only design, incorporating 18 decoder blocks. The model utilizes Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads, which contributes to efficient inference by minimizing memory consumption for the Key-Value (KV) cache. For activation, the model employs SwiGLU, and for normalization, it integrates RMSNorm. Positional embeddings are handled via Rotary Position Embeddings (RoPE), facilitating effective long-context understanding. The tokenizer for Falcon3-1B supports an extensive vocabulary of 131,000 tokens, which aids in data compression and downstream performance. Furthermore, the architecture incorporates Flash Attention 2 for optimized computational throughput.
Falcon3-1B is designed for a variety of natural language processing tasks, including but not limited to reasoning, language comprehension, instruction following, code generation, and mathematical problem-solving. Its design allows for its deployment in generative AI applications and conversational AI systems. The model's efficiency and optimized variants, such as quantized versions, enable its use in environments with constrained resources, providing a practical solution for diverse real-world applications.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
Ranking is for Local LLMs.
No evaluation benchmarks for Falcon3-1B available.
Overall Rank
-
Coding Rank
-
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens