ApX 标志

趋近智

Falcon-180B

参数

180B

上下文长度

2.048K

模态

Text

架构

Dense

许可证

Falcon-180B TII License and Acceptable Use Policy

发布日期

23 Sept 2023

知识截止

Dec 2022

技术规格

注意力结构

Multi-Query Attention

隐藏维度大小

12288

层数

60

注意力头

96

键值头

1

激活函数

GELU

归一化

Layer Normalization

位置嵌入

ROPE

系统要求

不同量化方法和上下文大小的显存要求

Falcon-180B

The Falcon-180B model, developed by the Technology Innovation Institute (TII), represents a large-scale causal decoder-only language model designed for advanced natural language processing tasks. It is an evolution of the Falcon 40B model, significantly scaled in parameter count. The model aims to serve as a foundational component for various applications requiring sophisticated language understanding and generation capabilities, including text generation, conversational AI, and summarization. This model has been specifically engineered to facilitate further fine-tuning for specialized use cases, with a separate chat-optimized variant available that has been fine-tuned on instruction datasets.

Architecturally, Falcon-180B implements an optimized transformer design, drawing inspiration from the GPT-3 framework while incorporating key innovations. A notable feature is the adoption of Multi-Query Attention (MQA), which enhances scalability and optimizes inference performance by enabling all attention heads to share a single key and value projection. The model also utilizes Rotary Position Embeddings (RoPE) for encoding positional information within sequences and incorporates FlashAttention for efficient attention computations. Its decoder blocks employ a parallel attention/MultiLayer Perceptron (MLP) structure with two layer norms, contributing to its processing efficiency. Training was conducted on a vast dataset of 3.5 trillion tokens, primarily derived from TII's RefinedWeb dataset (approximately 85%), supplemented by curated corpora including technical papers, conversations, and code. This extensive pretraining, which involved up to 4,096 A100 GPUs and accumulated around 7,000,000 GPU hours, leveraged a custom distributed training codebase named Gigatron, employing a 3D parallelism strategy combined with ZeRO optimization.

Falcon-180B is engineered for robust performance across a spectrum of language-based activities. Its design supports tasks that necessitate deep understanding and logical reasoning, such as complex research, code generation, and knowledge-based querying. The extensive training on a diverse corpus enables the model to effectively store and retrieve information, making it suitable for question answering systems and generating summaries of complex topics. The model's inherent versatility allows it to adapt to and perform effectively in a wide array of domains, supporting its utility as a powerful tool for diverse applications.

关于 Falcon

The TII Falcon model family comprises causal decoder-only language models (7B, 40B). Their architecture, adapted from GPT-3, integrates rotary positional embeddings, Multi-Query Attention for inference efficiency, and FlashAttention for accelerated operations. Models are trained on the RefinedWeb dataset.


其他 Falcon 模型

评估基准

排名适用于本地LLM。

没有可用的 Falcon-180B 评估基准。

排名

排名

-

编程排名

-

GPU 要求

完整计算器

选择模型权重的量化方法

上下文大小:1024 个令牌

1k
1k
2k

所需显存:

推荐 GPU

Falcon-180B: Specifications and GPU VRAM Requirements