ApX 标志

趋近智

Falcon-1B

参数

1B

上下文长度

8.192K

模态

Text

架构

Dense

许可证

TII Falcon-LLM License 2.0

发布日期

17 Dec 2024

知识截止

-

技术规格

注意力结构

Multi-Query Attention

隐藏维度大小

768

层数

24

注意力头

32

键值头

1

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Falcon-1B

The Falcon3-1B model, developed by the Technology Innovation Institute (TII), is a member of the Falcon3 family of open foundation models, designed for efficient operation with a parameter count around 1 billion. This model aims to advance capabilities in scientific reasoning, mathematical problem-solving, and code understanding. Variants such as Falcon3-1B-Base provide a raw, pretrained foundation suitable for subsequent fine-tuning across diverse natural language processing applications, while Falcon3-1B-Instruct is further optimized for conversational interfaces and adherence to explicit instructions.

Architecturally, Falcon3-1B is a causal decoder-only Transformer. It incorporates 18 decoder blocks, a design choice contributing to its efficiency. A key innovation within its architecture is the implementation of Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads. This GQA structure is engineered to enhance inference speed and reduce memory consumption. The model also employs a wider head dimension of 256 and utilizes Rotary Position Embedding (RoPE) to facilitate long context understanding.

The activation function used throughout the network is SwiGLU, combined with RMSNorm for normalization, contributing to stable training and performance. The model's design focuses on enabling robust language understanding and generation across multiple languages, including English, French, Spanish, and Portuguese. Its optimized architecture and relatively compact parameter size make it a candidate for deployment in environments with limited computational resources, such as edge devices, while still delivering strong performance for a range of language-based tasks.

关于 Falcon

The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.


其他 Falcon 模型

评估基准

排名适用于本地LLM。

没有可用的 Falcon-1B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
4k
8k

所需显存:

推荐 GPU