Friday, April 08, 2005

Yonah's SQL Tip of the Week #2 - Using CUBE and ROLLUP

SQL has two great operators to let you produce 'pretty' reports from aggregate queries. Basically, when you perform an aggreate query like, SUM or AVG and use group by to aggregate the results, you can then use either the WITH CUBE or WITH ROLLUP to produce sum totals for all of your groups.

For example, let's say you have an orders table for an e-Bookstore that has the following three fields - Date,Title,Price_Paid - The first two are the date of purchase and the book title, and the third field is the actual price paid by the consumer. Since the site runs promotions all of the time, the price paid by each individual varies. You have been asked to provide a report that shows the average price paid title - simple enough, your query will probably look like this:

SELECT Title,AVG(price_paid)
from ORDERS
Group by Title


Simple enough, no? So now management comes back and says, hey this is great, but we'd like to see it with monthly totals. So you re-write your query as follows (changes are bold):

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)


Essentially the output from this query has two fields- the month and date and the average price paid per field. It is also important to note that the order in which you specify the group by fields does make a difference - if you keep the order above, you will notice that the records will first be sorted by title, and then by date. So all of the records for title 1 will be displayed in chronological order before the results for title 2. If you reverse the order of the arguments where date comes first in the group by you will notice that all of the records will be date ordered and then ordered by the titles. Although we could alter this by adding an Order by clause, it's nice to know that this is the default behavior. This ordering also comes in handy with the CUBE and ROLLUP operators.

For those of you not familiar the cube and rollup operators can be used in conjunction with the group by attribute to produce aggregates over a greater group. So, say for example we wanted our query to not only compute the average price paid per month, but also include the average price overall for a specific title. Or, the average price for all titles in a specific month? That's where cube and rollup come in.

Here is our query again, with one small modification:

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)
WITH ROLLUP


Executing this new query now gives you almost the same results as the previous one, with one difference. At the very end of each title group, there is an extra row with 'NULL' for the month. This is the overall average price paid for the book. If we reverse the order of the grouping - putting the month column before the title one, we will instead now have a row with a 'NULL' title value, which is the average price for all titles for that month. Now let's make one more change to our query:

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)
WITH CUBE


If you look at the results for this query, you will see that it combines the outputs of the two rollup queries. Essentially the general rule of thumb is that rollup provides aggregates for all group columns except for the last one, and cube provides aggregate for all group columns. So if we split up year and month into separate columns, and then used GROUP BY Year,Month,Title in our rollup query we would get rows representing:
  1. The average price paid per title in a given month and year
  2. The average price paid for a given year
  3. The average price paid for a specific month across all years.


If we changed it to cube, the only thing that would change is that we'd also have an average value per title for all months and years.

As you can see, these are powerful tools to add to your SQL repetoire - next week, we will provide three additional tips for making them work better.

No comments: