Relational databases are a cornerstone of data storage, providing a structured way to organize and access information. Think of them like highly organized digital filing cabinets where data is stored in tables. These tables look similar to spreadsheets, with rows representing individual records (like a specific customer or product) and columns representing attributes or characteristics of those records (like a customer's name or a product's price).
The "relational" part comes from the ability to link data between tables. For instance, you might have one table for Customers
and another for Orders
. Instead of duplicating customer information in the Orders
table, you can create a relationship between them, typically using unique identifiers. This structured approach ensures data consistency and reduces redundancy.
The standard language used to communicate with relational databases is SQL (Structured Query Language). Almost every relational database system (like PostgreSQL, MySQL, SQL Server, Oracle) understands SQL. As a data engineer, having a solid grasp of SQL is essential for extracting, transforming, and loading data stored in these systems.
At the heart of a relational database is the table.
Products
table, columns might include product_id
(number), name
(text), and price
(number).Products
table might represent a single item like ('101', 'Laptop', 1200.00).Table: Products
+------------+----------+--------+
| product_id | name | price | <- Columns (Attributes)
+------------+----------+--------+
| 101 | Laptop | 1200.00| <- Row (Record 1)
| 102 | Keyboard | 75.00| <- Row (Record 2)
| 103 | Monitor | 300.00| <- Row (Record 3)
+------------+----------+--------+
To manage relationships and ensure uniqueness, relational databases use keys:
Products
table, product_id
would likely be the primary key. No two products can have the same product_id
.Orders
table might have a product_id
column that refers back to the product_id
in the Products
table, linking each order line item to a specific product.Diagram illustrating the relationship between a
Products
table and anOrders
table via a foreign key.
SQL allows you to perform various operations on a database. The commands are generally grouped into categories:
CREATE TABLE
: Creates a new table.ALTER TABLE
: Modifies an existing table (e.g., add a column).DROP TABLE
: Deletes a table.INSERT
: Adds new rows (records) into a table.UPDATE
: Modifies existing rows.DELETE
: Removes rows.SELECT
: Fetches data from one or more tables.Let's look at some fundamental SQL commands using our Products
table example.
Creating a Table You define the table name and its columns along with their data types.
CREATE TABLE Products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10, 2)
);
This command creates the Products
table with three columns: product_id
as an integer and the primary key, name
as text (up to 100 characters), and price
as a decimal number.
Inserting Data To add records into the table:
INSERT INTO Products (product_id, name, price)
VALUES (101, 'Laptop', 1200.00);
INSERT INTO Products (product_id, name, price)
VALUES (102, 'Keyboard', 75.00);
Querying Data (SELECT
)
This is perhaps the most frequently used SQL command. It retrieves data.
To get all columns for all products:
SELECT * FROM Products;
The *
symbol is shorthand for "all columns".
To get only the name and price of all products:
SELECT name, price FROM Products;
Filtering Data (WHERE
)
You can specify conditions to retrieve only the rows that match.
To find products cheaper than $100:
SELECT name, price
FROM Products
WHERE price < 100.00;
This would return the 'Keyboard'.
Updating Data (UPDATE
)
To modify existing records. It's very important to use a WHERE
clause to specify which rows to update, otherwise you might update all rows in the table!
To change the price of the 'Laptop':
UPDATE Products
SET price = 1150.00
WHERE product_id = 101;
Deleting Data (DELETE
)
To remove rows from a table. Again, the WHERE
clause is significant to target specific rows. Without it, you risk deleting everything!
To delete the 'Keyboard' product:
DELETE FROM Products
WHERE product_id = 102;
Relational databases are popular for several reasons:
You'll find relational databases powering:
While powerful, relational databases have limitations:
Understanding how to interact with relational databases using SQL is a fundamental skill for data engineers. They are often the source systems from which data is extracted for pipelines, or the target systems where cleaned and transformed data is loaded for analysis. In the practice section later, you'll get hands-on experience creating a simple table yourself.
© 2025 ApX Machine Learning