Aggregate functions like COUNT(), SUM(), and AVG() compute summary statistics across all rows returned by a query. While calculating a single number, such as the total number of orders or the average product price across an entire catalog, is useful, you often need more granular summaries. For instance, instead of the overall average product price, you might want the average price for each product category. Or, instead of the total number of customers, you might want the number of customers in each city.
This is where the GROUP BY clause comes in. It allows you to divide the rows of your table into smaller groups based on the values in one or more columns. The aggregate functions are then applied independently to each of these groups.
Think of GROUP BY as a process that reorganizes your data before aggregation happens:
FROM and WHERE clauses into groups. All rows within a group have the same value in the column(s) specified in the GROUP BY clause.SELECT clause (like COUNT(), AVG(), SUM(), MIN(), MAX()) are then applied to each group separately.The basic syntax looks like this:
SELECT
column_to_group_by,
aggregate_function(column_to_aggregate)
FROM
table_name
WHERE
-- Optional: filter rows BEFORE grouping
condition
GROUP BY
column_to_group_by
ORDER BY
-- Optional: sort the grouped results
column_to_group_by; -- or aggregate_function result
Notice the order: GROUP BY comes after FROM and WHERE, but before ORDER BY.
Let's use a Orders table containing information about customer orders:
| order_id | customer_id | order_date | order_total |
|---|---|---|---|
| 101 | 1 | 2023-10-01 | 50.00 |
| 102 | 2 | 2023-10-01 | 120.50 |
| 103 | 1 | 2023-10-05 | 75.25 |
| 104 | 3 | 2023-10-06 | 30.00 |
| 105 | 2 | 2023-10-08 | 80.00 |
| 106 | 1 | 2023-10-10 | 45.75 |
Suppose we want to find the total amount spent by each customer. We need to group the rows by customer_id and then apply the SUM() function to the order_total for each group.
SELECT
customer_id,
SUM(order_total) AS total_spent
FROM
Orders
GROUP BY
customer_id;
Here's how SQL processes this:
Orders table.customer_id:
customer_id = 1 (orders 101, 103, 106)customer_id = 2 (orders 102, 105)customer_id = 3 (order 104)SUM(order_total) for each group:
| customer_id | total_spent |
|---|---|
| 1 | 171.00 |
| 2 | 200.50 |
| 3 | 30.00 |
We could also find the number of orders per customer using COUNT(*):
SELECT
customer_id,
COUNT(*) AS number_of_orders
FROM
Orders
GROUP BY
customer_id;
This would return:
| customer_id | number_of_orders |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 1 |
SELECT List Rule with GROUP BYThere's an important rule when using GROUP BY: Any column in the SELECT list that is not an aggregate function must be included in the GROUP BY clause.
Why? Consider the query SELECT customer_id, SUM(order_total) FROM Orders GROUP BY customer_id;. This works because for each group (defined by a single customer_id), there's a single customer_id value and a single SUM(order_total) result.
Now, imagine you tried this:
-- This query is usually INVALID in standard SQL
SELECT
customer_id,
order_date, -- Not aggregated, not in GROUP BY
SUM(order_total) AS total_spent
FROM
Orders
GROUP BY
customer_id;
For customer_id 1, there are multiple order_date values ('2023-10-01', '2023-10-05', '2023-10-10'). Since the query collapses these three rows into a single output row for customer_id 1, SQL doesn't know which order_date to display. To avoid this ambiguity, SQL requires any non-aggregated column in the SELECT list to also be in the GROUP BY clause. If you put order_date in the GROUP BY, you'd be grouping by the combination of customer_id and order_date, which is a different calculation.
You are not limited to grouping by just one column. You can specify multiple columns in the GROUP BY clause to create finer-grained groups based on the unique combination of values in those columns.
Let's add a product_category column to our Orders table (assuming each order is for one category for simplicity):
| order_id | customer_id | product_category | order_total |
|---|---|---|---|
| 101 | 1 | Books | 50.00 |
| 102 | 2 | Electronics | 120.50 |
| 103 | 1 | Groceries | 75.25 |
| 104 | 3 | Books | 30.00 |
| 105 | 2 | Groceries | 80.00 |
| 106 | 1 | Books | 45.75 |
Now, let's find the total spent by each customer within each product category:
SELECT
customer_id,
product_category,
SUM(order_total) AS category_total_spent
FROM
Orders
GROUP BY
customer_id, product_category -- Group by the combination
ORDER BY
customer_id, product_category; -- Optional sorting
SQL now forms groups based on unique pairs of (customer_id, product_category):
The result would be:
| customer_id | product_category | category_total_spent |
|---|---|---|
| 1 | Books | 95.75 |
| 1 | Groceries | 75.25 |
| 2 | Electronics | 120.50 |
| 2 | Groceries | 80.00 |
| 3 | Books | 30.00 |
The diagram illustrates how
GROUP BY customer_id, product_categorypartitions the original rows based on unique combinations of customer and category, allowingSUM(order_total)to be calculated for each distinct pair.
WHERERemember that the WHERE clause filters individual rows before the GROUP BY operation takes place. If you want to calculate aggregates only for a subset of your original data, use WHERE. For example, to find the total spent per customer, but only considering orders placed after '2023-10-04':
SELECT
customer_id,
SUM(order_total) AS total_spent_recent
FROM
Orders
WHERE
order_date > '2023-10-04' -- Filter rows first
GROUP BY
customer_id; -- Group remaining rows
This query first removes orders 101 and 102, and then groups the remaining rows (103, 104, 105, 106) by customer_id before summing the totals.
The GROUP BY clause is fundamental for creating meaningful summaries from your data. It works hand-in-hand with aggregate functions to collapse multiple rows into informative summary statistics for distinct groups within your dataset. Next, we'll look at how to filter these grouped results using the HAVING clause.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with