Tuesday, April 19, 2005

Yonah's SQL Tip #3 - 2 tricks with CUBE and ROLLUP

Last week we talked about the CUBE and ROLLUP operators for SQL. Of course, they are very versatile, but are sometimes hard to work with. In this week's SQL Tip, we are going to focus on two things - providing custom group names for aggregates and altering the sort order so that aggregates come last.

First Let's Start with this query aginst the pubs database that ships with SQL server:

select type,title,avg(price)
from titles
group by type,title
with rollup


The results of course, will show the type, title, and average price for each entry in the titles table (of course avg is meaningless since each title only has one price, but you get the point). We will also see that for each ‘type’ there is one additional entry that has a NULL value in the title field. This entry is the average price for all of the titles of a specific type. Of course, the NULL is unsightly. It would be much more user-friendly, if the title field contained a useful phrase like Average for all titles or Average for all type titles where type is the specific type.

This is very easy to do, using the isNull() function:

select type,isNull(title,’Average for all ‘+type+’ titles’) as title,avg(price)
from titles
group by type,title
with rollup


With this small change we now have a pretty name for our ‘rolled-up’ field. However, we still have a problem with sorting – because if we sort by title – the Averages will come up close to the top of the list. So now we need a little trick to get them to sort properly. That’s where we can make use of a handy little case statement:


select type,isNull(title,'Average for all '+type+' titles') as title,avg(price),
case when title is null then 1 else 0 end as sort_order
from titles
group by type,title
with rollup
order by type,sort_order


This will effectively sort the totals for each type to appear below all of its titles. Of course, lets say we want to put all of the totals at the bottom, we can simply change the last line to read:

order by sort_order, type

By reversing the order of the columns, we now ensure that all specific titles appear first, before the type summaries.

Friday, April 08, 2005

Yonah's SQL Tip of the Week #2 - Using CUBE and ROLLUP

SQL has two great operators to let you produce 'pretty' reports from aggregate queries. Basically, when you perform an aggreate query like, SUM or AVG and use group by to aggregate the results, you can then use either the WITH CUBE or WITH ROLLUP to produce sum totals for all of your groups.

For example, let's say you have an orders table for an e-Bookstore that has the following three fields - Date,Title,Price_Paid - The first two are the date of purchase and the book title, and the third field is the actual price paid by the consumer. Since the site runs promotions all of the time, the price paid by each individual varies. You have been asked to provide a report that shows the average price paid title - simple enough, your query will probably look like this:

SELECT Title,AVG(price_paid)
from ORDERS
Group by Title


Simple enough, no? So now management comes back and says, hey this is great, but we'd like to see it with monthly totals. So you re-write your query as follows (changes are bold):

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)


Essentially the output from this query has two fields- the month and date and the average price paid per field. It is also important to note that the order in which you specify the group by fields does make a difference - if you keep the order above, you will notice that the records will first be sorted by title, and then by date. So all of the records for title 1 will be displayed in chronological order before the results for title 2. If you reverse the order of the arguments where date comes first in the group by you will notice that all of the records will be date ordered and then ordered by the titles. Although we could alter this by adding an Order by clause, it's nice to know that this is the default behavior. This ordering also comes in handy with the CUBE and ROLLUP operators.

For those of you not familiar the cube and rollup operators can be used in conjunction with the group by attribute to produce aggregates over a greater group. So, say for example we wanted our query to not only compute the average price paid per month, but also include the average price overall for a specific title. Or, the average price for all titles in a specific month? That's where cube and rollup come in.

Here is our query again, with one small modification:

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)
WITH ROLLUP


Executing this new query now gives you almost the same results as the previous one, with one difference. At the very end of each title group, there is an extra row with 'NULL' for the month. This is the overall average price paid for the book. If we reverse the order of the grouping - putting the month column before the title one, we will instead now have a row with a 'NULL' title value, which is the average price for all titles for that month. Now let's make one more change to our query:

SELECT Title,DateName(m,DATE)+' '+DateName(y,DATE) as Month,AVG(price_paid)
from ORDERS
Group by Title,DateName(m,DATE)+' '+DateName(y,DATE)
WITH CUBE


If you look at the results for this query, you will see that it combines the outputs of the two rollup queries. Essentially the general rule of thumb is that rollup provides aggregates for all group columns except for the last one, and cube provides aggregate for all group columns. So if we split up year and month into separate columns, and then used GROUP BY Year,Month,Title in our rollup query we would get rows representing:
  1. The average price paid per title in a given month and year
  2. The average price paid for a given year
  3. The average price paid for a specific month across all years.


If we changed it to cube, the only thing that would change is that we'd also have an average value per title for all months and years.

As you can see, these are powerful tools to add to your SQL repetoire - next week, we will provide three additional tips for making them work better.

Monday, March 28, 2005

Having fun with the Google APIs

A while back, I downloaded Google's APIs. Essentially they will provide access to their search database for those who want to try experimenting with their site and search tools.

Among the many ideas that I toyed with was the opportunity to find the rank of a specific site for a given keyword. In other words, let's say I sold Widgets - it would be pretty useful to find out on what page in Google's results my site is on when I search for the term Widgets.

So using the google apis, I managed to develop a tool that does just that. It simply iterates through the search results and tells you what number page your result shows up on.

Of course, the Google apis are limited to 1000 hits a day, which can get used up very fast. Instead, I am currently revamping this to search Google, Yahoo, MSN, and A9 without having to use the APIs. When I finish, I will be more than glad to post the code.

Another thing that I think will be useful - coming up with a unique enough term that will almost guarantee that your site will be found when looking for it. When I figure that one out, I will change my blog name, and then let you know :)


Yonah's SQL Tip of the Week #1 - Top and Sorting Views

I am sure that you've stumbled across the Top command. You might have seen it by looking at the SQL statement when browsing tables in Enterprise Manager. You may have seen it in the Syntax for SELECT. No matter what the Top command is a very useful command in more ways than one.

Essentially Top lives up to its name. The most Common syntax is:
SELECT TOP n * from table order by sales_count DESC

The query above will return the first n rows of the result set. This type of query can be useful in many situations. For example, when you are outputting the top ten selling products on your website. Notice that the ORDER BY clause is set. Order by essentially determines the order in which the rows are returned, and since we are filtering out the top 10, it also controls which records get displayed. Without Order By, Top would only return the first ten records in the result set.

Top also has another mode - percent. Instead of returning a fixed number of rows, percent returns the top n percent of a result set. This can be good if you want to find, say, the top ten percent of the most frequently sold items:

Select top 10 percent * from products, order by sales_count desc


The Percent mode will work with any percentage, up to an including 100 Percent. Why would you want to put top 100 percent in a query if not specifying top at all gives the same result? Well I could definitely think of one main reason for this - Creating a View with a Default Sort. In SQL, views cannot include an order by clause, unless the TOP keyword is specified. Obviously, this needs to be there in order to display the appropriate records. But let's say that you want to pre-sort an entire result set in a view without specifying ORDER BY in your query? Since you can't put an Order By clause into the without specifying TOP, simply put the following SQL into your view:

Create View myView as
Select TOP 100 percent from table, order by sales_count DESC

Now you have a view that is pre-sorted.

Yonah's SQL Tip of the Week - Introduction

So one of the new features I want for this blog is a weekly SQL tip. I have worked and chatted with hundreds of developers over the years, and while even the most junior of my colleagues have great experience with the basic SQL commands, very few have mastered even a handful of the less-commonly used functions and features.

I can't begin to tell you how many e-mails and message board posts I've seen where people have created intricate SQL statements where, had they known about some of these features, they could have used only one or two extra words in their query instead of lines.

That's why I am putting these solutions into a weekly tip.