趋近智
参数
11B
上下文长度
8.192K
模态
Text
架构
Dense
许可证
TII Falcon License 2.0
发布日期
20 Jul 2024
知识截止
-
注意力结构
Multi-Query Attention
隐藏维度大小
5632
层数
40
注意力头
44
键值头
1
激活函数
-
归一化
-
位置嵌入
ROPE
不同量化方法和上下文大小的显存要求
Falcon 2 11B is an 11 billion parameter large language model developed by the Technology Innovation Institute (TII). This causal decoder-only model is designed to serve as a foundational component for various natural language processing applications. Its development focuses on enhancing accessibility and inference efficiency, thereby encouraging broader adoption and the creation of specialized downstream applications. The model supports multilingual understanding and generation, making it suitable for diverse linguistic contexts.
Architecturally, Falcon 2 11B is built upon the transformer framework, specifically employing a causal decoder-only configuration that operates on a next-token prediction objective. The model incorporates several key innovations adapted from the GPT-3 architecture, including the use of rotary positional embeddings for improved sequence length handling and FlashAttention-2 for optimized attention mechanisms. A notable feature is the implementation of Grouped Query Attention (GQA) with 8 key-value heads, which aims to balance efficiency and performance in attention computations. The decoder blocks utilize a parallel attention/MLP structure. The training regimen involved a four-stage process, progressively extending the effective context window to 8192 tokens. It was trained on an extensive dataset exceeding 5 trillion tokens, primarily derived from RefinedWeb, a high-quality filtered and deduplicated web corpus, augmented with curated data including code and conversational content.
Falcon 2 11B is equipped with multilingual capabilities, trained on data spanning languages such as English, German, Spanish, French, Italian, Dutch, Polish, Portuguese, Czech, Romanian, and Swedish. This broad linguistic coverage enables the model to perform effectively across multiple languages. The model serves as a base for tasks such as text generation, language translation, and summarization, emphasizing its role as a versatile foundation model for fine-tuning to specific domain requirements and applications. Its optimized design supports faster processing, contributing to more efficient deployment in various use cases.
The Falcon 2 model family by TII encompasses the 11B language model and its Vision Language Model (VLM) counterpart. These open-source models, with 11 billion parameters, are trained on over five trillion tokens, providing multilingual support. The VLM variant integrates vision-to-language capabilities, enabling the processing of visual inputs for textual outputs.
排名适用于本地LLM。
没有可用的 Falcon2-11B 评估基准。