ApX logoApX logo

Falcon3-1B

Parameters

1B

Context Length

8.192K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

16

Key-Value Heads

4

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

1,000,042

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

768

Number of Layers

18

FFN Intermediate Size (Dense)

8,192

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

131,072

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 768 · Context: 8.2k · Vocab: 131.1kx 18 layersRMSNormPre-AttentionGrouped-Query Attention16Q / 4KV headsHead dim: 256+RMSNormPre-FFNFeed-Forward NetworkSwiGLUIntermediate: 8.2k+Final RMSNormOutput Logits

Falcon3-1B

The Falcon3-1B model is a member of the Falcon 3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This family of models emphasizes enhancing capabilities in scientific, mathematical, and coding domains, while maintaining a strong focus on training efficiency. The Falcon3-1B variant is specifically engineered to operate effectively on lightweight computational infrastructures, including devices such as laptops, thereby broadening the accessibility of advanced AI capabilities. It supports multilingual applications, including English, French, Spanish, and Portuguese.

Architecturally, Falcon3-1B is built upon a Transformer-based causal decoder-only design, incorporating 18 decoder blocks. The model utilizes Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads, which contributes to efficient inference by minimizing memory consumption for the Key-Value (KV) cache. For activation, the model employs SwiGLU, and for normalization, it integrates RMSNorm. Positional embeddings are handled via Rotary Position Embeddings (RoPE), facilitating effective long-context understanding. The tokenizer for Falcon3-1B supports an extensive vocabulary of 131,000 tokens, which aids in data compression and downstream performance. Furthermore, the architecture incorporates Flash Attention 2 for optimized computational throughput.

Falcon3-1B is designed for a variety of natural language processing tasks, including but not limited to reasoning, language comprehension, instruction following, code generation, and mathematical problem-solving. Its design allows for its deployment in generative AI applications and conversational AI systems. The model's efficiency and optimized variants, such as quantized versions, enable its use in environments with constrained resources, providing a practical solution for diverse real-world applications.

About Falcon 3

The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.


Other Falcon 3 Models

Evaluation Benchmarks

No evaluation benchmarks for Falcon3-1B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

68 / 100

Falcon3-1B Model Integrity Report

Total Score

68

/ 100

B

Audit Note

Falcon3-1B exhibits strong transparency in its architectural design and hardware requirements, providing developers with clear specifications for local deployment. However, the model's data provenance remains somewhat opaque, relying on general category descriptions rather than detailed source disclosures. While technically accessible, the custom licensing terms and lack of reproducible evaluation artifacts represent significant hurdles for fully transparent auditing.

Upstream

21.5 / 30

Architectural Provenance

8.0 / 10

The Falcon3-1B architecture is comprehensively documented as a causal decoder-only Transformer with 18 layers. TII explicitly details the use of Grouped Query Attention (GQA) with 8 query heads and 4 key-value heads, SwiGLU activation, RMSNorm, and Rotary Position Embeddings (RoPE). The model's provenance is clearly linked to a 'pruning and healing' methodology derived from larger Falcon 3 models (3B and 7B), which is a significant level of methodological disclosure for a 1B variant.

Dataset Composition

4.5 / 10

TII provides a high-level breakdown of the training data, stating it was 'healed' on 80 billion tokens consisting of web, code, STEM, and multilingual data. While it mentions the RefinedWeb dataset as a primary source for the family, the specific proportions for the 1B variant's 80GT healing set are not disclosed. The instruct version mentions 1.2 million samples for post-training, but detailed source lists or data filtering scripts for this specific variant are absent.

Tokenizer Integrity

9.0 / 10

The tokenizer is publicly available via Hugging Face and is fully integrated into the 'transformers' library. It features a large vocabulary of 131,072 tokens, supporting English, French, Spanish, and Portuguese. Technical details such as the use of Byte Pair Encoding (BPE) and specific special tokens are well-documented in the model card and configuration files.

Model

26.5 / 40

Parameter Density

8.5 / 10

The model is a dense architecture with approximately 1B parameters (specifically cited as 1.67B total in some technical manifests like Ollama, though marketed as 1B). TII provides a detailed architectural breakdown including layer counts (18), head dimensions (256), and attention configurations, which allows for precise verification of parameter distribution across the model.

Training Compute

5.0 / 10

TII discloses that the 1B model was pruned and healed using 256 H100 GPU chips. However, the total training duration (GPU hours) and the associated carbon footprint or energy consumption for this specific variant are not explicitly provided. While hardware types are named, the lack of duration or environmental impact metrics prevents a higher score.

Benchmark Reproducibility

4.0 / 10

While TII provides a table of internal benchmark results (MMLU, GSM8K, etc.) and mentions using the 'lm-evaluation-harness', they do not release the specific evaluation code, exact prompts, or few-shot examples used to achieve these scores. This limits third-party ability to replicate the exact reported figures. (Score adjusted for discovered external research indicating potential contamination risks in the model family).

Identity Consistency

9.0 / 10

The model consistently identifies as part of the Falcon 3 family from TII in its system prompts and documentation. It maintains a clear versioning distinction between 'Base' and 'Instruct' variants. There is no evidence of the model claiming to be a competitor's product or misrepresenting its fundamental nature as an AI.

Downstream

19.5 / 30

License Clarity

6.0 / 10

The model is released under the 'TII Falcon-LLM License 2.0'. While the license is publicly accessible and based on Apache 2.0, it includes custom clauses and an Acceptable Use Policy. There is significant community debate regarding its 'open source' status due to commercial restrictions (royalty obligations for high-revenue entities), which creates ambiguity compared to standard OSI-approved licenses.

Hardware Footprint

8.5 / 10

Hardware requirements are exceptionally well-documented. TII and community partners provide specific VRAM requirements for various quantization levels (FP16, INT8, INT4) and context lengths. The model's efficiency on consumer hardware like laptops is a primary focus of its documentation, with clear guidance on deployment via tools like llama.cpp and Ollama.

Versioning Drift

5.0 / 10

The model uses a clear naming convention (Falcon3-1B-Instruct), but a formal, public changelog tracking silent updates or weight drifts is not readily available. While the release date is clear (December 2024), there is no established infrastructure for users to track ongoing updates or access specific historical snapshots beyond the initial release.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
4k
8k

VRAM Required:

Recommended GPUs