ApX logoApX logo

Falcon-1B

Parameters

1B

Context Length

8.192K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Multi-Query Attention

Attention Heads

32

Key-Value Heads

1

Attention Head Dimension

-

Position Embedding

ROPE

RoPE Theta

-

Sliding Window Attention

-

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

768

Number of Layers

24

FFN Intermediate Size (Dense)

-

Multi-Token Prediction Heads

-

Tokenizer

Vocabulary Size

-

Architecture Diagram

Input TokensToken EmbeddingPosition: RoPEHidden: 768 · Context: 8.2kx 24 layersRMSNormPre-AttentionMulti-Query Attention32Q / 1KV headsHead dim: 24+RMSNormPre-FFNFeed-Forward NetworkSwiGLU+Final RMSNormOutput Logits

Falcon-1B

The Falcon3-1B model, developed by the Technology Innovation Institute (TII), is a member of the Falcon3 family of open foundation models, designed for efficient operation with a parameter count around 1 billion. This model aims to advance capabilities in scientific reasoning, mathematical problem-solving, and code understanding. Variants such as Falcon3-1B-Base provide a raw, pretrained foundation suitable for subsequent fine-tuning across diverse natural language processing applications, while Falcon3-1B-Instruct is further optimized for conversational interfaces and adherence to explicit instructions.

Architecturally, Falcon3-1B is a causal decoder-only Transformer. It incorporates 18 decoder blocks, a design choice contributing to its efficiency. A key innovation within its architecture is the implementation of Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads. This GQA structure is engineered to enhance inference speed and reduce memory consumption. The model also employs a wider head dimension of 256 and utilizes Rotary Position Embedding (RoPE) to facilitate long context understanding.

The activation function used throughout the network is SwiGLU, combined with RMSNorm for normalization, contributing to stable training and performance. The model's design focuses on enabling robust language understanding and generation across multiple languages, including English, French, Spanish, and Portuguese. Its optimized architecture and relatively compact parameter size make it a candidate for deployment in environments with limited computational resources, such as edge devices, while still delivering strong performance for a range of language-based tasks.

About Falcon

The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.


Other Falcon Models

Evaluation Benchmarks

No evaluation benchmarks for Falcon-1B available.

Rankings

Overall Rank

-

Coding Rank

-

Model Integrity

Total Score

B

67 / 100

Falcon-1B Model Integrity Report

Total Score

67

/ 100

B

Audit Note

Falcon3-1B-Instruct demonstrates a strong commitment to architectural transparency, providing clear details on its pruning-based origin and specific Transformer configurations. While the model's identity and basic technical specs are well-documented, it suffers from significant opacity regarding the specific composition of its training datasets and the total compute resources consumed. The use of a custom license and the lack of detailed reproduction code for benchmarks further limit its transparency profile to a moderate level.

Upstream

20.0 / 30

Architectural Provenance

7.5 / 10

The Falcon3-1B-Instruct model is explicitly documented as a causal decoder-only Transformer with 18 decoder blocks. TII provides specific technical details including the use of Grouped Query Attention (GQA) with 8 query heads and 4 key-value heads, a head dimension of 256, SwiGLU activation, and RMSNorm. Crucially, the model's provenance is described as being 'pruned and healed' from a larger 3B Falcon model using knowledge distillation, which is a more transparent disclosure of origin than many 'trained from scratch' claims. However, the full pretraining procedure for the parent 3B model is only partially detailed in the general Falcon 3 blog post.

Dataset Composition

4.5 / 10

TII discloses that the model was trained on 80 Gigatokens of data for the 'healing' phase and the instruct version was post-trained on 1.2 million samples. General categories are provided (web, code, STEM, multilingual), but specific percentage breakdowns of the 80GT or the 14T tokens used for the base 7B model are not publicly available. The data collection and filtering methodologies are mentioned in marketing terms ('curated', 'high-quality') without technical documentation or public access to the underlying datasets.

Tokenizer Integrity

8.0 / 10

The tokenizer is publicly accessible via Hugging Face and the vocabulary size is clearly stated as 131,072 tokens. It is documented as a BPE-based tokenizer with support for English, French, Spanish, and Portuguese. The alignment with the claimed language support is verifiable through the provided configuration files and the 'transformers' library integration, though detailed training data for the tokenizer itself is not explicitly separated from the general pretraining data.

Model

28.0 / 40

Parameter Density

9.0 / 10

The model is a dense architecture with a clearly stated parameter count of approximately 1 billion. Unlike MoE models, there is no ambiguity between active and total parameters. The architectural breakdown (18 layers, specific head counts) is provided in the model card, allowing for a precise understanding of parameter distribution across the network.

Training Compute

4.0 / 10

TII discloses the hardware used (256 H100 GPU chips) for the pruning and healing phase of the 1B model. However, the total GPU hours, training duration, and specific energy consumption or carbon footprint for the 1B variant are not provided. While the 7B base model's compute is mentioned (1024 H100s for 14T tokens), the lack of specific metrics for the 1B-Instruct variant's post-training and healing phases leaves significant gaps.

Benchmark Reproducibility

6.0 / 10

Benchmark results are reported for standard sets like MMLU, ARC, and GSM8K. TII specifies the use of the 'lm-evaluation-harness' and mentions the use of chat templates and few-shot settings. However, the exact prompts, few-shot examples, and specific evaluation code for reproducing the reported 'raw scores' are not fully public, and third-party verification is limited to leaderboard entries rather than independent audits.

Identity Consistency

9.0 / 10

The model consistently identifies itself as Falcon3 from TII in its system prompts and documentation. There is clear version tracking within the Falcon 3 family (1B, 3B, 7B, 10B) and a distinction between Base and Instruct variants. The model does not exhibit identity confusion with competitors like Llama or GPT in official documentation or standard testing environments.

Downstream

18.5 / 30

License Clarity

6.5 / 10

The model is released under the 'TII Falcon-LLM License 2.0'. While based on Apache 2.0, it includes a custom 'Acceptable Use Policy' and specific terms that are not standard open-source. The license allows for commercial use but includes restrictions and requirements (such as attribution and compliance with the AUP) that make it a 'weights-available' license rather than a pure open-source license, leading to some ambiguity for enterprise users.

Hardware Footprint

7.0 / 10

VRAM requirements are well-documented by the community and partially by TII through the release of GGUF, AWQ, and GPTQ variants. The model card specifies the 8K context length for the 1B variant, and memory scaling for this context is predictable. However, official documentation lacks a comprehensive table of VRAM vs. batch size vs. quantization levels, relying instead on third-party implementations like Ollama and llama.cpp for this data.

Versioning Drift

5.0 / 10

The model has a clear release date (December 2024) and is part of a numbered family. However, there is no public changelog or semantic versioning for the weights themselves (e.g., v1.0 vs v1.1). While the initial release is well-documented, the infrastructure for tracking future silent updates or performance drift is not explicitly presented to the public.

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
4k
8k

VRAM Required:

Recommended GPUs