Falcon-3B: Specifications and GPU VRAM Requirements

Falcon-3B

开源

开放权重

参数

上下文长度

32.768K

模态

Text

架构

Dense

许可证

TII Falcon-LLM License 2.0

发布日期

17 Dec 2024

训练数据截止日期

技术规格

注意力结构

Multi-Query Attention

隐藏维度大小

1536

层数

注意力头

键值头

激活函数

SwigLU

归一化

RMS Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Falcon-3B

Falcon-3B is a member of the Falcon 3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant, with 3 billion parameters, is engineered for efficient deployment on various hardware, including systems with limited resources such as laptops and single GPUs. Its primary purpose is to deliver robust performance across a spectrum of natural language processing tasks, focusing on reasoning, language understanding, instruction following, code generation, and mathematics. The Falcon-3B model also supports multilingual capabilities, specifically English, French, Spanish, and Portuguese.

The architectural foundation of Falcon-3B is a transformer-based causal decoder-only design. It incorporates several innovations to enhance efficiency and performance. Notably, it utilizes Grouped Query Attention (GQA), a mechanism that optimizes inference speed and reduces Key-Value (KV) cache memory consumption by sharing parameters among attention heads. The model employs SwiGLU as its activation function and RMSNorm for normalization, contributing to stable and effective learning. Positional embeddings are handled using Rotary Positional Embeddings (RoPE) to support extended context comprehension. Furthermore, the model leverages FlashAttention 2 for accelerated attention computations and features a high vocabulary size of 131,000 tokens, enabling improved compression and downstream performance.

Falcon-3B, along with its instruction-tuned counterpart, has been developed using techniques such as pruning and knowledge distillation from the larger Falcon3-7B-Base model, resulting in an efficient and performant compact model. The base variant supports a context length of 8,000 tokens, while the instruction-tuned variant extends this capability to 32,000 tokens, allowing it to process and generate responses for longer and more complex inputs. This design paradigm makes Falcon-3B a suitable choice for applications requiring advanced AI functionalities in environments where computational resources are a consideration.

关于 Falcon

The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.

其他 Falcon 模型

评估基准

排名适用于本地LLM。

没有可用的 Falcon-3B 评估基准。

排名

编程排名

GPU 要求

完整计算器

量化

选择模型权重的量化方法

上下文大小：1024 个令牌

16k

32k

所需显存:

资源

官方文档发布说明下载权重

Falcon-3B

技术规格

系统要求

Falcon-3B

关于 Falcon

其他 Falcon 模型

评估基准

排名

GPU 要求

所需显存:

推荐 GPU

资源