|
YOUR FEEDBACK
|
TOP MICROSOFT .NET LINKS SQL Server 2005 A Point of Order: Organize Your Data with SQL Ranking Functions
Sooner or later, you'll find a process that requires you to generate a series of sequential numbers
By: Jerry Dixon
Feb. 19, 2006 03:45 PM
Sequential numbers. Sooner or later, you'll find a process that requires you to generate a series of sequential numbers. As a matter of fact, most database engines provide at least one method of producing such numbers. Microsoft Access, for example, has AutoNumber columns, while SQL Server has IDENTITY columns. However, these are features of tables. There are times when you need some sequential numbers, but creating a table is inconvenient or inefficient. So how does one proceed? How does one obtain a list of sequential numbers without creating tables?
IDENTITY Columns The answer is: you don't. You use a table-valued variable instead. Variables are almost always faster and more efficient than actual tables. Refer to the code in Listing 1. This code creates a table-valued variable called @OrderedRows that has three columns of actual data. The fourth column is an IDENTITY column, which is used to assign an order to that data. An example of the output from such a table-valued variable can be seen in Figure 1. Each row receives a number, starting with the seed number and increasing each time by the specified increment. This technique has been available for years, and is still available in SQL Server 2005. Now however, we have more options available.
Ranking Functions The PARTITION clause provides us with a very interesting capability. Without it, the ranking functions will create sequential numbers across an entire result set. With it, however, we can specify groups or categories that should be numbered. The numbering will then restart with each group. In addition, when the PARTITION clause is used to create these groups, the ORDER BY clause applies to each. In order words, each resulting group will be individually sorted. Using these two clauses together, we can devise numbering schemes that are far more complex than the IDENTITY clause could ever produce. So, what does each individual ranking function do? The RANK function assigns a rank number to each row of the result set or group. The DENSE_RANK function does this too. The difference is how the two functions handle ties. Remember that the rows of the result set or group are sorted according to the specified ORDER BY clause. It is quite possible for multiple rows to have identical values in the sort column(s). When this occurs, the tied rows will receive equal rank numbers - but how does the numbering continue afterward? In Listing 3 I've created a query that lists some products and their associated categories. The rows inside each category are sorted by ascending list price. The rows are also assigned a rank via the RANK function. This can be seen in Figure 2. Notice that there are three products with the lowest price, therefore there are three rows with a rank of one. The next row has a rank of four, not two. Sometimes this is not what we want. We might prefer that the numbering resume with the next rank, which in this case would be two. The code in Listing 4 does just this. It uses the DENSE_RANK function, which produces the results shown in Figure 3. The ROW_NUMBER function can be used when we want consecutive numbers and we don't care about ties. In these cases, no numbers should be repeated, and no numbers should be skipped. Listing 5 shows an example query, with the results shown in Figure 4. You can see that, within each category, the numbering is consecutive with no repeating values. The number does start over when a new category is reached, which is one of the main reasons that this function is different from an IDENTITY column. Remember that the PARTITION clause is optional, though, so we can use the ROW_NUMBER function to mimic an IDENTITY column. We would simply leave out the PARTITION clause. The advantage of this function is that you don't have to have a table in order to obtain your sequential numbers. The disadvantage is that the numbering will always be consecutive - the numbers will always start with one and will always increment by one. IDENTITY columns can have an increment greater than one and can start with a number other than one. (Pay special attention to this ranking function though, because we will use it in a real-world example later on.) The final ranking function, NTILE, is a special case. The other three ranking functions produce sequential numbers across the entire result set or group. This function creates a specific number of subgroups, giving each row in the subgroup the same rank number. This can best be seen with an example. The query in Listing 6 produces the same product list that we've been using all along. This time, however, a parameter of 3 has been passed to the ranking function. The parameter specifies the number of subgroups that should be created. Examine the results in Figure 5. Notice that each product category has been divided into three subgroups and that each row of a subgroup has been given the same rank. Also notice that not all subgroups have the same number of rows. The NTILE function will divide the result set or group as evenly as possible, but some variance will occur when the number of rows in a group is not an exact multiple of the NTILE parameter.
Paging a Result Set The key to this paging solution is the assignment of sequential numbers to the rows of the result set. These numbers can then be used to calculate which rows need to be included on a specific page. With SQL Server 2000, these numbers are typically created via an IDENTITY column. With SQL Server 2005, I prefer to use the ROW_NUMBER ranking function. This allows me to create sequential numbers without having to resort to the use of a table-valued variable. Refer to Listing 7. I've created a stored procedure that accepts two parameters: a page number and a page size. This information will be used later to filter the rows that get sent back to the caller. The first SELECT statement returns the starting row number for the specified page, the ending row number for the specified page, and the total number of rows in the Product table. The application can use this information to determine if it is displaying the first page, the last page, or a page somewhere in the middle. Next, notice that I've used a nonrecursive Common Table Expression (CTE) to simplify the code. (Take special note of the semicolon at the end of the previous SELECT statement. When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon. For more on CTEs, refer to the December edition of this column in DNDJ Vol. 3, iss. 10.) This CTE returns a couple of columns from the Product table, along with a row number created by the ROW_NUMBER ranking function. I have omitted the PARTITION clause, because I want a single set of numbers to be created across the entire Product table. The next SELECT statement returns the Product information from this CTE, but filters out those rows that do not correspond to the requested page. This ensures that only the desired number of rows is returned to the caller. This solution returns two result sets: one that contains paging information and one that contains one page of actual data. Programmers using ADO.NET can access the second result set via a call to the NEXTRESULT method. If you don't wish to return the paging information, you can simply eliminate the first SELECT statement. In that case, the only result set that will be returned will be the one containing the Product data rows. Now, we should test the stored procedure and verify the results. Listing 8 shows how to obtain the first page with a page size of 10. Figure 6 shows the output. Similarly, Listing 9 shows how to pull the second page, while Figure 7 shows that output. Finally, just to prove that the paging works as expected, the code in Listing 10 returns a 20-row page that contains the data from the previous two examples. Examine the results in Figure 8, and you can see that the paging is working properly.
Summary YOUR FEEDBACK
MICROSOFT .NET LATEST STORIES
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING NEWS FROM THE WIRES
|
|||||||||||||||||||||||||||||||||||||