Falcon-1B: Specifications and GPU VRAM Requirements

Falcon-1B

Closed Source

Open Weights

Parameters

Context Length

8.192K

Modality

Text

Architecture

Dense

License

TII Falcon-LLM License 2.0

Release Date

17 Dec 2024

Knowledge Cutoff

Technical Specifications

Attention Structure

Multi-Query Attention

Hidden Dimension Size

768

Number of Layers

Attention Heads

Key-Value Heads

Activation Function

SwigLU

Normalization

RMS Normalization

Position Embedding

ROPE

System Requirements

VRAM requirements for different quantization methods and context sizes

Falcon-1B

The Falcon3-1B model, developed by the Technology Innovation Institute (TII), is a member of the Falcon3 family of open foundation models, designed for efficient operation with a parameter count around 1 billion. This model aims to advance capabilities in scientific reasoning, mathematical problem-solving, and code understanding. Variants such as Falcon3-1B-Base provide a raw, pretrained foundation suitable for subsequent fine-tuning across diverse natural language processing applications, while Falcon3-1B-Instruct is further optimized for conversational interfaces and adherence to explicit instructions.

Architecturally, Falcon3-1B is a causal decoder-only Transformer. It incorporates 18 decoder blocks, a design choice contributing to its efficiency. A key innovation within its architecture is the implementation of Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads. This GQA structure is engineered to enhance inference speed and reduce memory consumption. The model also employs a wider head dimension of 256 and utilizes Rotary Position Embedding (RoPE) to facilitate long context understanding.

The activation function used throughout the network is SwiGLU, combined with RMSNorm for normalization, contributing to stable training and performance. The model's design focuses on enabling robust language understanding and generation across multiple languages, including English, French, Spanish, and Portuguese. Its optimized architecture and relatively compact parameter size make it a candidate for deployment in environments with limited computational resources, such as edge devices, while still delivering strong performance for a range of language-based tasks.

About Falcon

The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.

Other Falcon Models

Evaluation Benchmarks

Ranking is for Local LLMs.

No evaluation benchmarks for Falcon-1B available.

Rankings

Overall Rank

Coding Rank

GPU Requirements

Full Calculator

Quantization

Choose the quantization method for model weights

Context Size: 1,024 tokens

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights