Bivariate analysis examines the connections between pairs of variables. This approach is essential for identifying patterns that involve more than one feature in a dataset.
One of the most common and informative scenarios in bivariate analysis involves examining the relationship between two numerical variables. For instance, how does engine displacement relate to fuel efficiency? Or how does advertising spend correlate with sales? The primary visual tool for investigating such relationships is the scatter plot.
A scatter plot displays individual data points on a two-dimensional graph. Each point's position is determined by the values of two selected numerical variables: one variable dictates the position on the horizontal axis (x-axis), and the other dictates the position on the vertical axis (y-axis). This visualization allows us to directly observe the structure, direction, and strength of the association between the two variables.
Python libraries like Matplotlib and Seaborn provide convenient functions for generating scatter plots. Seaborn's scatterplot function is particularly useful as it integrates smoothly with Pandas DataFrames.
Let's start with a simple example. We'll generate some synthetic data where two variables have a roughly linear relationship and then plot them using Seaborn.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Generate sample data
np.random.seed(42) # for reproducibility
x_data = np.random.rand(100) * 10
y_data = 2.5 * x_data + np.random.randn(100) * 5 # y = 2.5x + noise
# Create a DataFrame
df = pd.DataFrame({'Variable_X': x_data, 'Variable_Y': y_data})
# Create the scatter plot
plt.figure(figsize=(8, 5)) # Set the figure size
sns.scatterplot(data=df, x='Variable_X', y='Variable_Y')
# Add labels and title for clarity
plt.xlabel("Variable X")
plt.ylabel("Variable Y")
plt.title("Scatter Plot of Variable Y vs. Variable X")
# Display the plot
plt.show()
This code first creates two NumPy arrays, x_data and y_data, where y_data is linearly dependent on x_data with some added random noise. These are then put into a Pandas DataFrame. Finally, sns.scatterplot is called with the DataFrame and the column names for the x and y axes. We also add labels and a title using Matplotlib functions for better interpretation.
When examining a scatter plot, look for these important characteristics:
Here are visual examples of different patterns:
A clear upward trend indicates a positive linear association.
A clear downward trend indicates a negative linear association.
Points are scattered randomly, suggesting little to no linear association between X and Y.
s parameter in scatterplot or scatter).alpha parameter (e.g., alpha=0.5).plt.xlabel, plt.ylabel) and provide an informative title (plt.title). This is fundamental for communicating your findings.scatterplot allows this using the hue (color), size, or style parameters to differentiate points based on the third variable's values.Scatter plots provide an invaluable first look at the potential relationship between two numerical variables. They visually summarize the association's direction, form, and strength, guiding further quantitative analysis, such as calculating correlation coefficients, which we will discuss next.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with