Inference, Hugging Face, 2024 (Hugging Face) - This documentation provides practical guidance and code examples for performing inference with various LLMs using the Transformers library, including discussions on hardware considerations, optimization techniques, and deployment strategies.
Generative AI with Large Language Models, DeepLearning.AI and Amazon Web Services, 2024 (Coursera) - This online course provides an overview of LLMs, covering their architecture, training, fine-tuning, and practical aspects of deployment and inference, including hardware considerations.