Imagine you have a spreadsheet containing information about customer orders. For every single order, you might list the product ordered, the quantity, the price, the order date, the customer's name, their full shipping address, and their email address. Now, what happens if a customer places 100 orders? You'd have their name, address, and email repeated 100 times in your spreadsheet!
This approach presents several problems:
Relational databases solve these problems by breaking down information into multiple, related tables. This process is often related to the concept of database normalization. Instead of one giant table, you might have:
Customers table: Contains information specific to each customer (like CustomerID, Name, Address, Email). Each customer appears only once.Orders table: Contains information specific to each order (like OrderID, OrderDate, CustomerID, TotalAmount). Notice it includes CustomerID, which links an order back to the customer who placed it.Products table: Contains information about each product (like ProductID, ProductName, Price).OrderDetails table: Links orders to products, specifying which products were in which order and the quantity (e.g., OrderDetailID, OrderID, ProductID, Quantity).This separation offers significant advantages:
Customers table. Product details are stored only once in the Products table.Customers table. This change automatically reflects across all their past and future orders when the data is combined for viewing. Similarly, if a product's price changes, you update it once in the Products table.However, for data analysis, you often need to see the combined picture. You might need to answer questions like:
Looking only at the Orders table won't give you customer names or addresses. Looking only at the Customers table won't give you order details. To answer these analytical questions, you must bring the relevant information together from the separate tables.
Consider our simple Customers and Orders tables:
A simple database structure with
CustomersandOrderstables, linked by theCustomerIDcolumn. PK denotes Primary Key, FK denotes Foreign Key.
To get a report showing customer names alongside their order amounts, you need to combine rows from Customers and Orders where the CustomerID matches. This is precisely why SQL provides mechanisms like JOIN clauses. They allow you to define how related tables should be connected based on common columns (like CustomerID here), enabling you to query the combined data as if it were in a single, unified view for your analysis.
Understanding how to combine data is therefore a fundamental skill in SQL for data science, as it unlocks the ability to analyze relationships scattered across a well-structured database. The following sections will teach you the specific SQL commands, starting with INNER JOIN, to achieve this.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with