趋近智
参数
1B
上下文长度
8.192K
模态
Text
架构
Dense
许可证
TII Falcon-LLM License 2.0
发布日期
17 Dec 2024
知识截止
-
注意力结构
Multi-Query Attention
隐藏维度大小
768
层数
24
注意力头
32
键值头
1
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
The Falcon3-1B model, developed by the Technology Innovation Institute (TII), is a member of the Falcon3 family of open foundation models, designed for efficient operation with a parameter count around 1 billion. This model aims to advance capabilities in scientific reasoning, mathematical problem-solving, and code understanding. Variants such as Falcon3-1B-Base provide a raw, pretrained foundation suitable for subsequent fine-tuning across diverse natural language processing applications, while Falcon3-1B-Instruct is further optimized for conversational interfaces and adherence to explicit instructions.
Architecturally, Falcon3-1B is a causal decoder-only Transformer. It incorporates 18 decoder blocks, a design choice contributing to its efficiency. A key innovation within its architecture is the implementation of Grouped Query Attention (GQA), configured with 8 query heads and 4 key-value heads. This GQA structure is engineered to enhance inference speed and reduce memory consumption. The model also employs a wider head dimension of 256 and utilizes Rotary Position Embedding (RoPE) to facilitate long context understanding.
The activation function used throughout the network is SwiGLU, combined with RMSNorm for normalization, contributing to stable training and performance. The model's design focuses on enabling robust language understanding and generation across multiple languages, including English, French, Spanish, and Portuguese. Its optimized architecture and relatively compact parameter size make it a candidate for deployment in environments with limited computational resources, such as edge devices, while still delivering strong performance for a range of language-based tasks.
The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.
排名适用于本地LLM。
没有可用的 Falcon-1B 评估基准。