趋近智
参数
10B
上下文长度
32.768K
模态
Text
架构
Dense
许可证
TII Falcon-LLM License 2.0
发布日期
17 Dec 2024
知识截止
Nov 2024
注意力结构
Grouped-Query Attention
隐藏维度大小
5120
层数
40
注意力头
40
键值头
10
激活函数
SwigLU
归一化
RMS Normalization
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
The Falcon3-10B is a member of the Falcon3 family of decoder-only large language models, developed by the Technology Innovation Institute (TII). This model variant is designed to advance capabilities in scientific reasoning, mathematics, and code generation. It is available in both base and instruction-tuned versions, facilitating diverse applications from general text generation to conversational AI. The model operates efficiently on various infrastructures, including resource-limited devices like laptops, due to its design considerations and optimized quantized versions.
Architecturally, Falcon3-10B is a Transformer-based causal decoder-only model featuring 40 decoder blocks, which define its deep structure. A key innovation in its attention mechanism is the implementation of Grouped Query Attention (GQA) with 12 query heads and 4 key-value heads, which contributes to faster inference. The model utilizes a wider head dimension of 256 and incorporates Rotary Position Embeddings (RoPE) to support extended context understanding. For non-linearity, it employs the SwiGLu activation function, and its normalization scheme relies on RMSNorm. These architectural choices aim to balance performance with computational efficiency.
The Falcon3-10B model was constructed through a process that included depth up-scaling from the Falcon3-7B-Base model, followed by continued pre-training on 2 trillion tokens of high-quality data. The training corpus for the broader Falcon3 family comprised 14 trillion tokens, encompassing web content, code, scientific, technological, engineering, and mathematics (STEM) data, as well as high-quality and multilingual datasets. This extensive training enables the model to handle a context length of up to 32,000 tokens, supporting detailed analysis of long inputs and coherent multi-turn interactions. It supports inference in multiple languages, including English, French, Spanish, and Portuguese.
The TII Falcon 3 model family comprises open-source, decoder-only language models (1B-10B parameters) designed for efficiency. Key innovations include an extended 32K token context window, Grouped-Query Attention (GQA), and specialized versions for scientific and code-oriented applications. Some variants integrate Mamba-based architectures.
排名适用于本地LLM。
没有可用的 Falcon3-10B 评估基准。