Datasets: A Hugging Face library for accessing and sharing datasets, Hugging Face, 2024 (Hugging Face) - Official documentation for the Hugging Face datasets library, which provides tools for efficiently loading, processing, and sharing datasets for machine learning, including those used for fine-tuning large language models.
A Survey on Data-Centric AI: From Data to Model, Hongzhi Wang, Peiran Ma, Yanli Li, Donghui Lin, Mengyuan Zhang, Jiancheng Li, 2023IEEE Transactions on Knowledge and Data Engineering, Vol. 35 (IEEE)DOI: 10.1109/TKDE.2023.3320623 - This survey provides a comprehensive review of data-centric AI, including discussions on data quality, data processing, and their impact on model performance, highly relevant to selecting and preparing datasets.
About The Licenses, Creative Commons, 2024 (Creative Commons) - Official guide explaining the different Creative Commons licenses and their implications, essential for understanding usage rights of publicly available datasets in commercial and research contexts.