Neural Discrete Representation Learning, Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.5555/3295222.3295240 - Introduces the Vector Quantized Variational Autoencoder (VQ-VAE) architecture, detailing its components, training, and the use of a discrete latent space for high-quality generation.
Generating Diverse High-Fidelity Images with VQ-VAE-2, Ali Razavi, Aaron van den Oord, Oriol Vinyals, 2019Advances in Neural Information Processing Systems (NeurIPS), Vol. 32DOI: 10.48550/arXiv.1906.00446 - Presents VQ-VAE-2, an extension that significantly improves image generation quality by combining a hierarchical VQ-VAE with a powerful autoregressive prior, crucial for the two-stage generation process.
Zero-Shot Text-to-Image Generation, Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever, 2021Proceedings of the 38th International Conference on Machine Learning, Vol. 139 (PMLR)DOI: 10.32473/pmlr.v139.ramesh21a - Demonstrates a prominent application of VQ-VAE within DALL-E, highlighting its role in learning discrete visual tokens that are then modeled by a Transformer for text-to-image generation.