Having loaded and performed initial checks on the dataset, the next step is to examine the characteristics of individual variables. This chapter focuses on univariate analysis, the practice of analyzing one variable at a time to understand its underlying distribution, central tendencies, and spread.
You will learn how to calculate and interpret descriptive statistics for numerical variables, including measures of central tendency (mean, median, mode) and dispersion (variance, standard deviation σ, range, Interquartile Range or IQR). We will use visualizations like histograms and box plots created with Matplotlib and Seaborn to represent these distributions graphically and identify potential outliers.
For categorical variables, the focus shifts to understanding frequency counts and proportions. You will learn to compute these summaries using Pandas and visualize them effectively using bar charts. We will also cover basic statistical methods for identifying potential outliers within numerical data, such as using the Z-score, calculated as: Z=σx−μ or applying the IQR rule.
By completing this chapter, you will be able to systematically summarize and visualize the properties of individual variables, a fundamental step in any data exploration process.
3.1 Analyzing Numerical Variables: Central Tendency
3.2 Analyzing Numerical Variables: Dispersion
3.3 Visualizing Numerical Variables: Histograms
3.4 Visualizing Numerical Variables: Box Plots
3.5 Analyzing Categorical Variables: Frequency Counts
3.6 Visualizing Categorical Variables: Bar Charts
3.7 Identifying Outliers using Statistical Methods
3.8 Practice: Univariate Exploration
© 2025 ApX Machine Learning