We've seen how GROUP BY
lets us aggregate data into summary rows. For instance, we can find the total number of orders placed by each customer or the average rating for each product category. But what if we only want to see the results for groups that meet certain criteria? For example, maybe we only care about customers who have placed more than 10 orders, or product categories with an average rating above 4.0.
You might initially think of using the WHERE
clause, as we did for filtering individual rows in the previous chapter. However, WHERE
has a limitation: it filters rows before the aggregation step happens. It operates on the raw data in the table, not on the summarized results produced by GROUP BY
and aggregate functions like COUNT()
, AVG()
, or SUM()
.
This is where the HAVING
clause comes in.
The HAVING
clause is specifically designed to filter results after the GROUP BY
clause has done its work and the aggregate functions have been calculated. It allows you to apply conditions to the summarized data for each group.
The basic structure of a query including HAVING
looks like this:
SELECT
column_name(s),
aggregate_function(column_name)
FROM
table_name
WHERE
condition -- Optional: Filters rows BEFORE aggregation
GROUP BY
column_name(s)
HAVING
aggregate_condition -- Filters groups AFTER aggregation
ORDER BY
column_name(s); -- Optional: Sorts the final results
Notice the order: WHERE
comes before GROUP BY
, and HAVING
comes after GROUP BY
.
Understanding when to use WHERE
and when to use HAVING
is important for writing correct SQL queries involving aggregation.
WHERE
Clause: Filters individual rows before they are grouped and aggregated. You use conditions on the original columns of the table here.HAVING
Clause: Filters entire groups after they have been created by GROUP BY
and summarized by aggregate functions. You use conditions on the results of aggregate functions (like COUNT(*)
, AVG(price)
) or the grouping columns themselves.Think of it like this: WHERE
decides which ingredients go into the bowls for mixing, while HAVING
decides which finished bowls (groups) are interesting enough to keep.
Let's illustrate with an example. Imagine an Orders
table:
OrderID | CustomerID | OrderTotal |
---|---|---|
1 | 101 | 50.00 |
2 | 102 | 120.00 |
3 | 101 | 75.00 |
4 | 103 | 30.00 |
5 | 102 | 80.00 |
6 | 101 | 60.00 |
7 | 102 | 200.00 |
Scenario 1: Find customers who have placed more than 2 orders.
We need to count orders per customer and then filter based on that count.
SELECT
CustomerID,
COUNT(OrderID) AS NumberOfOrders
FROM
Orders
GROUP BY
CustomerID
HAVING
COUNT(OrderID) > 2; -- Filter groups based on the aggregate count
Result:
CustomerID | NumberOfOrders |
---|---|
101 | 3 |
102 | 3 |
Here, GROUP BY CustomerID
first groups the rows by customer. Then COUNT(OrderID)
calculates the number of orders for each customer (101: 3 orders, 102: 3 orders, 103: 1 order). Finally, HAVING COUNT(OrderID) > 2
filters these grouped results, keeping only those customers with more than 2 orders. Using WHERE COUNT(OrderID) > 2
would result in an error because COUNT(OrderID)
isn't calculated until after the WHERE
clause is processed.
Scenario 2: Find customers whose total spending is greater than $150.
SELECT
CustomerID,
SUM(OrderTotal) AS TotalSpending
FROM
Orders
GROUP BY
CustomerID
HAVING
SUM(OrderTotal) > 150.00; -- Filter based on the aggregate sum
Result:
CustomerID | TotalSpending |
---|---|
101 | 185.00 |
102 | 400.00 |
Again, HAVING
filters the results after SUM(OrderTotal)
has been calculated for each CustomerID
group.
Scenario 3: Find customers who have placed more than 1 order, but only consider orders with an OrderTotal
greater than $50.
Here, we need both WHERE
and HAVING
.
SELECT
CustomerID,
COUNT(OrderID) AS NumberOfQualifyingOrders,
SUM(OrderTotal) AS TotalQualifyingSpending
FROM
Orders
WHERE
OrderTotal > 50.00 -- Filter individual orders BEFORE grouping
GROUP BY
CustomerID
HAVING
COUNT(OrderID) > 1; -- Filter groups AFTER aggregation
Let's trace this:
WHERE OrderTotal > 50.00
: Rows with OrderID 1, 2, 3, 5, 6, 7 are kept. Row 4 (OrderTotal = 30.00) is discarded.GROUP BY CustomerID
: The remaining rows are grouped:
COUNT(OrderID)
and SUM(OrderTotal)
are calculated for each group:
HAVING COUNT(OrderID) > 1
: Both groups satisfy this condition (3 > 1).Result:
CustomerID | NumberOfQualifyingOrders | TotalQualifyingSpending |
---|---|---|
101 | 3 | 185.00 |
102 | 3 | 400.00 |
To solidify the difference, consider the sequence in which SQL processes these clauses:
A simplified view of the logical processing order of SQL clauses.
WHERE
acts early on individual rows, whileHAVING
acts later on aggregated groups.
As the diagram shows, WHERE
operates before GROUP BY
and the aggregate functions are computed, while HAVING
operates afterwards.
In summary, use HAVING
when you need to filter your results based on the outcome of an aggregate function (COUNT
, SUM
, AVG
, MIN
, MAX
) applied to groups created by GROUP BY
. If you need to filter based on the values in the original rows before aggregation, use WHERE
. Sometimes, as seen in the third example, you might need to use both WHERE
and HAVING
in the same query to achieve the desired filtering at different stages.
© 2025 ApX Machine Learning