While line plots are excellent for showing trends in data where the order of points matters, such as values changing over time, often we want to understand the relationship between two distinct variables. Does one variable tend to increase when another increases? Is there no apparent connection? For visualizing these kinds of relationships, scatter plots are the standard tool.
A scatter plot uses individual dots or markers to represent the values obtained for two different numerical variables. One variable determines the position on the horizontal axis (x-axis), and the other determines the position on the vertical axis (y-axis). Unlike line plots, the points are generally not connected by lines because the focus is on the pattern formed by the distribution of points, not a sequential progression.
scatter()
Matplotlib provides the plt.scatter()
function to create scatter plots easily. It takes two primary arguments: an array or list of x-coordinates and a corresponding array or list of y-coordinates.
Let's consider an example. Suppose we have data representing the temperature (in Celsius) and the number of ice cream scoops sold on different days. We can use a scatter plot to see if there's a relationship.
import matplotlib.pyplot as plt
import numpy as np
# Sample data: Temperature (Celsius) and Scoops Sold
temperatures = np.array([14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2])
scoops_sold = np.array([215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408])
# Create the figure and axes objects
fig, ax = plt.subplots()
# Create the scatter plot
ax.scatter(temperatures, scoops_sold)
# Add labels and title (as learned previously)
ax.set_xlabel("Temperature (°C)")
ax.set_ylabel("Ice Cream Scoops Sold")
ax.set_title("Temperature vs. Ice Cream Sales")
# Display the plot
plt.show()
Executing this script will generate a scatter plot. Each point on the plot represents a single day, positioned according to its temperature and the number of scoops sold.
Each marker shows the sales for a specific recorded temperature.
By examining the plot, you can observe the general pattern. In this case, as the temperature increases (moving right along the x-axis), the number of scoops sold also tends to increase (moving up along the y-axis). This suggests a positive relationship or positive correlation between temperature and ice cream sales. If higher temperatures corresponded to lower sales, we would see a downward trend, indicating a negative relationship. If the points were scattered randomly with no discernible pattern, it would suggest little to no relationship between the variables.
Just like line plots, scatter plots can be customized to improve clarity or visual appeal. The ax.scatter()
function (or plt.scatter()
) accepts several arguments for this purpose:
c
: Sets the color of the markers. You can use color names ('red'), hex codes ('#FF5733'), or abbreviations ('g' for green).s
: Sets the size of the markers. It takes a numerical value representing the marker area.marker
: Specifies the shape of the markers. Common options include 'o' (circle, default), 's' (square), '^' (triangle up), 'd' (diamond), '*' (star), and '+'.alpha
: Controls the transparency of the markers, ranging from 0 (completely transparent) to 1 (completely opaque). This is very useful when many points overlap, allowing you to see the density of points in different regions.Let's modify our previous example to use larger, semi-transparent, square markers in an orange color.
import matplotlib.pyplot as plt
import numpy as np
# Sample data (same as before)
temperatures = np.array([14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2])
scoops_sold = np.array([215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408])
# Create the figure and axes objects
fig, ax = plt.subplots()
# Create the scatter plot with customizations
ax.scatter(temperatures, scoops_sold,
c='#fd7e14', # Orange color
s=60, # Larger marker size
marker='s', # Square marker
alpha=0.7) # Slight transparency
# Add labels and title
ax.set_xlabel("Temperature (°C)")
ax.set_ylabel("Ice Cream Scoops Sold")
ax.set_title("Temperature vs. Ice Cream Sales (Customized)")
# Display the plot
plt.show()
This code produces a plot with visually distinct markers.
Customized scatter plot with orange square markers, larger size, and transparency.
Remember that you can also apply other customizations learned earlier, such as setting axis limits using ax.set_xlim()
and ax.set_ylim()
, and adding grid lines with ax.grid()
.
It's important to choose the right plot type for your data.
plot()
) when you want to visualize a trend or progression over a continuous interval or sequence, like tracking a stock price over days or measuring a sensor value over time. The connection between points implies continuity or order.scatter()
) when you want to examine the relationship between two distinct numerical variables. Each point represents an independent observation, and connecting them with lines is usually meaningless. Plotting our temperature vs. sales data with ax.plot()
would connect data points in the order they appear in the arrays, creating a confusing zig-zag line that doesn't represent a meaningful trend.Scatter plots are fundamental tools for exploring potential correlations and patterns between variables before applying more complex statistical analysis or machine learning models. They provide a direct visual check of how two quantities interact.
© 2025 ApX Machine Learning