所有课程

Transformer模型入门

章节 1: 序列建模与注意力机制基础

序列到序列任务的挑战

回顾：循环神经网络 (RNN)

传统循环神经网络方法的局限性

注意力机制原理介绍

注意力分数计算：一个宏观视角

来自注意力权重的上下文向量

第 1 章测验

章节 2: 自注意力与多头注意力

自注意力的原理

自注意力机制中的查询、键和值向量

缩放点积注意力机制

自注意力得分可视化

多头注意力简介

多头注意力机制如何运作

多头注意力机制的优势

动手实践：实现缩放点积注意力

第 2 章测验

章节 3: Transformer 编码器-解码器架构

整体架构概览

输入嵌入层

位置信息的必要性

位置编码说明

编码器层堆叠

加法与归一化层 (残差连接)

逐位置前馈网络

解码器堆栈

带掩码的多头自注意力

编码器-解码器注意力机制

最终线性层和Softmax

动手实践：构建编码器层

第 3 章测验

章节 4: Transformer模型的训练与实现

数据准备：分词

构建输入批次

序列任务的损失函数

正则化方法

基本实现概述

使用预训练模型库（简述）

实践：组装一个基本Transformer

第 4 章测验

数据准备：分词

这部分内容有帮助吗？

参考文献

Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich, Barry Haddow, Alexandra Birch, 2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1 DOI: 10.18653/v1/P16-1162 - 这篇基础性论文将字节对编码（BPE）引入自然语言处理领域，用于处理神经机器翻译中的稀有词和未登录词，使其成为一种常见的子词分词方法。
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2019 Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1 DOI: 10.18653/v1/N19-1423 - 这篇开创性论文介绍了BERT模型，该模型显著使用了WordPiece分词。它展示了子词分词在大型Transformer模型中进行语言理解的实际应用和重要性。
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing, Taku Kudo, 2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) DOI: 10.18653/v1/D18-2012 - 这篇论文介绍了SentencePiece，一个与语言无关的子词分词器，它直接处理原始文本，包括空格，使其特别适用于多种语言和一致的分词/去分词过程。
tokenizers: Fast State-of-the-Art Tokenizers, Hugging Face, 2023 (Hugging Face) - Hugging Face tokenizers库的官方文档，该库提供了BPE、WordPiece和其他用于Transformer模型的子词分词算法的优化实现。
Natural Language Processing with Transformers, Lewis Tunstall, Leandro von Werra, Thomas Wolf, 2022 (O'Reilly Media) - 一本实用指南，包含各种分词方法、特殊标记及其在Hugging Face Transformers生态系统中的应用的详细解释和代码示例。

© 2025 ApX Machine Learning用心打造