Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis, Yongqiang Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Ye Jia, Patrick Nguyen, Heiga Zen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu, 2018Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (Neural Information Processing Systems Foundation, Inc. (NeurIPS)) - 一篇奠基性论文,展示了通过利用预训练说话人验证模型中的说话人嵌入来条件化基于Tacotron的多说话人TTS系统,实现零样本语音克隆。
FiLM: Visual Reasoning with a General Conditioning Layer, Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville, 2018Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32(1) (Association for the Advancement of Artificial Intelligence)DOI: 10.1609/aaai.v32i1.11671 - 介绍了特征级线性调制(FiLM),这是一种广泛应用于神经网络的通用条件化方法,包括在高级语音合成模型中集成说话人嵌入。