梯度下降算法变体回顾 (SGD, 动量)

全新 · 开源

用于构建生产级 LLM 应用的 Python 工具包。提供提示词、RAG、智能体、结构化输出和多提供商支持等模块化实用工具。

这部分内容有帮助吗？

参考文献

Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - 涵盖深度学习理论和实践方面的综合教材，在第八章详细解释了梯度下降、SGD和动量。
On the importance of initialization and momentum in deep learning, Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, 2013 Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 (PMLR) DOI: 10.55982/annals.v28i1.9213 - 一篇基础论文，展示了动量（特别是Nesterov加速梯度）在训练深度神经网络中的有效性。
torch.optim.SGD, PyTorch Core Team, 2024 - PyTorch中随机梯度下降优化器的官方文档，详细说明了包括momentum在内的参数及其使用方法。
Lecture Notes: Optimization Algorithms, Stanford University, 2023 - 高质量的教育资源，在深度学习背景下提供了梯度下降、SGD和动量的易懂解释。