Tuesday, April 19, 2005

Yonah's SQL Tip #3 - 2 tricks with CUBE and ROLLUP

Last week we talked about the CUBE and ROLLUP operators for SQL. Of course, they are very versatile, but are sometimes hard to work with. In this week's SQL Tip, we are going to focus on two things - providing custom group names for aggregates and altering the sort order so that aggregates come last.

First Let's Start with this query aginst the pubs database that ships with SQL server:

select type,title,avg(price)
from titles
group by type,title
with rollup


The results of course, will show the type, title, and average price for each entry in the titles table (of course avg is meaningless since each title only has one price, but you get the point). We will also see that for each ‘type’ there is one additional entry that has a NULL value in the title field. This entry is the average price for all of the titles of a specific type. Of course, the NULL is unsightly. It would be much more user-friendly, if the title field contained a useful phrase like Average for all titles or Average for all type titles where type is the specific type.

This is very easy to do, using the isNull() function:

select type,isNull(title,’Average for all ‘+type+’ titles’) as title,avg(price)
from titles
group by type,title
with rollup


With this small change we now have a pretty name for our ‘rolled-up’ field. However, we still have a problem with sorting – because if we sort by title – the Averages will come up close to the top of the list. So now we need a little trick to get them to sort properly. That’s where we can make use of a handy little case statement:


select type,isNull(title,'Average for all '+type+' titles') as title,avg(price),
case when title is null then 1 else 0 end as sort_order
from titles
group by type,title
with rollup
order by type,sort_order


This will effectively sort the totals for each type to appear below all of its titles. Of course, lets say we want to put all of the totals at the bottom, we can simply change the last line to read:

order by sort_order, type

By reversing the order of the columns, we now ensure that all specific titles appear first, before the type summaries.

Friday, April 08, 2005

Yonah's SQL Tip of the Week #2 - Using CUBE and ROLLUP

SQL has two great operators to let you produce 'pretty' reports from aggregate queries. Basically, when you perform an aggreate query like, SUM or AVG and use group by to aggregate the results, you can then use either the WITH CUBE or WITH ROLLUP to produce sum totals for all of your groups.

For example, let's say you have an orders table for an e-Bookstore that has the following three fields - Date,Title,Price_Paid - The first two are the date of purchase and the book title, and the third field is the actual price paid by the consumer. Since the site runs promotions all of the time, the price paid by each individual varies. You have been asked to provide a report that shows the average price paid title - simple enough, your query will probably look like this:

SELECT Title,AVG(price_paid)
from ORDERS
Group by Title


Simple enough, no? So now management comes back and says, hey this is great, but we'd like to see it with monthly totals. So you re-write your query as follows (changes are bold):

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)


Essentially the output from this query has two fields- the month and date and the average price paid per field. It is also important to note that the order in which you specify the group by fields does make a difference - if you keep the order above, you will notice that the records will first be sorted by title, and then by date. So all of the records for title 1 will be displayed in chronological order before the results for title 2. If you reverse the order of the arguments where date comes first in the group by you will notice that all of the records will be date ordered and then ordered by the titles. Although we could alter this by adding an Order by clause, it's nice to know that this is the default behavior. This ordering also comes in handy with the CUBE and ROLLUP operators.

For those of you not familiar the cube and rollup operators can be used in conjunction with the group by attribute to produce aggregates over a greater group. So, say for example we wanted our query to not only compute the average price paid per month, but also include the average price overall for a specific title. Or, the average price for all titles in a specific month? That's where cube and rollup come in.

Here is our query again, with one small modification:

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)
WITH ROLLUP


Executing this new query now gives you almost the same results as the previous one, with one difference. At the very end of each title group, there is an extra row with 'NULL' for the month. This is the overall average price paid for the book. If we reverse the order of the grouping - putting the month column before the title one, we will instead now have a row with a 'NULL' title value, which is the average price for all titles for that month. Now let's make one more change to our query:

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)
WITH CUBE


If you look at the results for this query, you will see that it combines the outputs of the two rollup queries. Essentially the general rule of thumb is that rollup provides aggregates for all group columns except for the last one, and cube provides aggregate for all group columns. So if we split up year and month into separate columns, and then used GROUP BY Year,Month,Title in our rollup query we would get rows representing:
  1. The average price paid per title in a given month and year
  2. The average price paid for a given year
  3. The average price paid for a specific month across all years.


If we changed it to cube, the only thing that would change is that we'd also have an average value per title for all months and years.

As you can see, these are powerful tools to add to your SQL repetoire - next week, we will provide three additional tips for making them work better.