Loading a dataset and taking a first look at its contents are fundamental steps in data science. This process ensures data availability and provides an initial understanding of its structure and characteristics. Consider this initial examination as opening a package to confirm everything is present and to gain a general idea of the contents before further use.
For this exercise, imagine we have a simple dataset stored in a common format, like a Comma Separated Values (CSV) file. CSV files are just plain text files where data is organized in rows, and the values within each row are separated by commas. Let's say our file is named simple_sales.csv and contains basic information about product sales.
Here's what the raw data inside simple_sales.csv might look like:
Product,Category,Price,QuantitySold
Apple,Fruit,0.50,150
Banana,Fruit,0.30,250
Carrot,Vegetable,0.20,180
Broccoli,Vegetable,1.50,90
Orange,Fruit,0.60,120
This is a typical structure:
Product, Category, Price, QuantitySold).The first action is to "load" or "import" this data into whatever environment you might use for analysis. This could be a spreadsheet program (like Microsoft Excel or Google Sheets) or a data analysis tool or library (like pandas in Python, though we won't use specific code here).
The process involves:
simple_sales.csv file.After this step, the data is no longer just text in a file; it's structured within your analysis environment, ready for inspection.
Once the data is loaded, the immediate next step is to perform some basic checks. This helps confirm that the data loaded correctly and gives you a first feel for its content.
Most tools provide a way to look at the beginning, or "head", of the dataset. This usually shows the first 5 or 10 rows.
Looking at the head of our simple_sales data would show something like:
| Product | Category | Price | QuantitySold |
|---|---|---|---|
| Apple | Fruit | 0.50 | 150 |
| Banana | Fruit | 0.30 | 250 |
| Carrot | Vegetable | 0.20 | 180 |
| Broccoli | Vegetable | 1.50 | 90 |
| Orange | Fruit | 0.60 | 120 |
Why do this?
It's useful to know the size of your dataset: how many rows and how many columns it has. For our tiny example:
Why do this?
Look closely at the column headers and the data within them.
Product: Contains text strings (names of products). This seems like qualitative data.Category: Contains text strings (types of products). Also qualitative.Price: Contains numbers with decimals (currency values). This is quantitative (specifically, continuous) data.QuantitySold: Contains whole numbers (counts). This is quantitative (specifically, discrete) data.Why do this?
By performing these simple loading and inspection steps, we've:
This practical step is the gateway to data preparation. Having loaded and initially inspected the data, you're now better equipped to move on to the next stages discussed in this chapter, such as handling missing values (though our simple example has none) or identifying outliers, which are necessary before performing any meaningful analysis.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with