Effective data science work starts with obtaining and preparing the data. This chapter focuses on the practical steps required to gather information from various origins and refine it into a usable format for analysis.
You will learn techniques for accessing data stored in SQL databases and data warehouses, retrieving information from web APIs, and extracting structured content from websites using web scraping methods. We will also cover advanced strategies for data cleaning, handling missing values beyond simple imputation, applying necessary transformations like scaling xscaled=(x−min(x))/(max(x)−min(x)) and normalization, and integrating data from multiple sources through merging and joining operations. By the end of this chapter, you will have practiced applying these methods to prepare datasets for the subsequent stages of feature engineering and model building.
1.1 Connecting to Databases and Data Warehouses
1.2 Working with Web APIs for Data Retrieval
1.3 Techniques for Web Scraping Structured Data
1.4 Advanced Data Cleaning Methods
1.5 Strategies for Handling Missing Values
1.6 Data Transformation and Normalization Techniques
1.7 Merging and Joining Diverse Datasets
1.8 Hands-on: Data Acquisition and Wrangling Practice
© 2025 ApX Machine Learning