Imagine you have a large collection of information, perhaps about customers and their orders, students and their courses, or products and their inventory levels. How do you store this information so that it's organized, easy to search, and reliable? You could use spreadsheets, but as the data grows and becomes more connected, managing it becomes complex and error-prone. This is where databases come in, and specifically, a very common type called a relational database.
At its core, a database is simply an organized collection of structured information, or data, typically stored electronically in a computer system. It allows you to efficiently store, retrieve, modify, and manage that data.
So, what makes a database "relational"? The term comes from the way data is structured. In a relational database, information is organized into tables. You can think of a table much like a spreadsheet grid: it has columns (representing different attributes or pieces of information, like Name
or Price
) and rows (representing individual records or items, like a specific customer or product).
The "relational" part is the most significant aspect. It means these tables are not isolated islands of data. Instead, they can be linked or related to each other based on shared pieces of information. This structure allows you to logically connect different types of data.
For instance, you might have one table storing customer information and another table storing order information.
CustomerID
, FirstName
, LastName
, Email
. Each row represents a unique customer.OrderID
, OrderDate
, CustomerID
, TotalAmount
. Each row represents a specific order.Notice that both tables contain a CustomerID
. This common column acts as the link or relationship between the two tables. Using this shared CustomerID
, you can easily find all the orders placed by a specific customer, or look up the customer details associated with a particular order.
A simple visualization showing how a
Customers
table and anOrders
table can be related using a commonCustomerID
column.
This relational structure, based on the foundational ideas of the relational model proposed by Edgar F. Codd in 1970, offers several advantages:
OrderID
is always unique or that every CustomerID
in the Orders
table actually exists in the Customers
table.Customers
table and simply reference it using the CustomerID
in the Orders
table. This saves space and makes updates easier (if a customer changes their email, you only need to update it in one place).In summary, a relational database is a system for storing data in structured tables that can be linked together based on common information. This model provides an organized, efficient, and reliable foundation for managing data, which is essential for many tasks in data science, from basic reporting to building machine learning models. In the following sections, we'll look more closely at the components of these tables: columns, rows, and the types of data they hold.
© 2025 ApX Machine Learning