Now that you understand the basic anatomy of a Matplotlib plot, let's focus on creating some of the most frequently used plot types. These visualizations form the backbone of exploratory data analysis, allowing you to quickly grasp trends, relationships, and distributions within your datasets. We'll use matplotlib.pyplot
, conventionally imported as plt
, along with NumPy for generating sample data.
import matplotlib.pyplot as plt
import numpy as np
# Ensure plots are displayed properly in environments like Jupyter notebooks
# %matplotlib inline # Uncomment if using Jupyter
Line plots are excellent for visualizing trends over a continuous interval or sequence, such as time series data or the output of a mathematical function. Matplotlib's plt.plot()
function connects data points with lines.
Let's plot a simple quadratic function:
# Sample data: x values and their squares
x = np.arange(0, 10, 0.5) # Values from 0 to 9.5 with a step of 0.5
y = x**2
# Create the plot
plt.figure(figsize=(8, 4)) # Optional: Create a figure with a specific size
plt.plot(x, y)
# Add basic labels for clarity
plt.xlabel("X value")
plt.ylabel("Y value (X squared)")
plt.title("Simple Line Plot")
# Display the plot
plt.show()
The plt.plot(x, y)
command takes the x-coordinates and y-coordinates and draws lines between consecutive points. This immediately shows the upward curving trend of the quadratic function.
Scatter plots are used to visualize the relationship between two numerical variables. Each point on the plot represents an observation, positioned according to its values on the x and y axes. They are particularly useful for identifying correlations, clusters, or outliers. Use the plt.scatter()
function.
Let's visualize two sets of random numbers to see if there's any apparent relationship:
# Sample data: 50 random points for x and y
np.random.seed(42) # for reproducibility
x = np.random.rand(50)
y = np.random.rand(50)
# Create the scatter plot
plt.figure(figsize=(8, 5))
plt.scatter(x, y)
# Add basic labels
plt.xlabel("Random Variable X")
plt.ylabel("Random Variable Y")
plt.title("Scatter Plot of Two Random Variables")
# Display the plot
plt.show()
In this case, since the data is random, plt.scatter(x, y)
produces a cloud of points with no discernible pattern, indicating no correlation between these two variables. If there were a linear relationship, the points would tend to fall along a line.
Bar charts are ideal for comparing quantities across different discrete categories. Each bar's length represents the magnitude of the value for that category. Matplotlib provides plt.bar()
for vertical bars and plt.barh()
for horizontal bars.
Imagine comparing the counts of different types of fruits:
# Sample data: Categories and their counts
categories = ['Apples', 'Oranges', 'Bananas', 'Grapes']
counts = [23, 17, 31, 15]
# Create the bar chart
plt.figure(figsize=(7, 5))
plt.bar(categories, counts, color=['red', 'orange', 'yellow', 'purple']) # Optional: specify colors
# Add basic labels
plt.xlabel("Fruit Type")
plt.ylabel("Quantity")
plt.title("Quantity of Different Fruits")
# Display the plot
plt.show()
The plt.bar(categories, counts)
function creates bars where the position on the x-axis is determined by the category and the height on the y-axis corresponds to the count. We used the color
argument here to demonstrate a simple customization.
Histograms help visualize the distribution of a single numerical variable. They group data into "bins" (intervals) and display the frequency (count) of observations falling into each bin using bars. This reveals the underlying frequency distribution, like whether it's symmetric, skewed, or multimodal. Use the plt.hist()
function.
Let's look at the distribution of 1000 random numbers drawn from a standard normal distribution:
# Sample data: 1000 points from a standard normal distribution
data = np.random.randn(1000)
# Create the histogram
plt.figure(figsize=(8, 5))
# 'bins=30' suggests dividing the data range into 30 intervals
plt.hist(data, bins=30, edgecolor='black') # edgecolor makes bins clearer
# Add basic labels
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram of Normally Distributed Data")
# Display the plot
plt.show()
plt.hist(data, bins=30)
automatically calculates the range of the data, divides it into 30 equal intervals (bins), counts how many data points fall into each bin, and draws a bar for each bin representing that count. The resulting shape approximates the bell curve characteristic of a normal distribution. The bins
argument controls the granularity of the distribution view.
These four plot types. line, scatter, bar, and histogram. provide a fundamental toolkit for initial data exploration. While these examples included basic labels, the next section will cover how to extensively customize the appearance of your plots to make them more informative and visually appealing.
© 2025 ApX Machine Learning