Sunday, 4 November 2012

Rownumber in sql server


How to use ROW_NUMBER() to enumerate and partition records in SQL Server

By 31 Dec 2011
 
I had a situation recently where I had a table full of people records, where the people were divided into families. The business logic that needed to be followed was that I had to assign a “Twin Code” to each record. This meant that for each family in the database, if two or more members were born on the same day they should be treated as twins. The twins should be assigned a number enumerating them in order of birth. If the member was not a twin, they should just receive the twin code of 1.
Here’s an example table:
PersonIDFamilyIDFirstNameLastNameDateOfBirth
11JoeJohnson2000-10-23 13:00:00
21JimJohnson2001-12-15 05:45:00
32KarlyMatthews2000-05-20 04:00:00
42KacyMatthews2000-05-20 04:02:00
52TomMatthews2001-09-15 11:52:00
There are lots of ways to achieve the desired result, but the simplest is to just use a simple SELECT statement combined with the ROW_NUMBER() function with a couple parameters as to how to number the rows!
ROW_NUMBER() provides you with the number of a row in a given recordset, where you provide details on how to number the records. For example, if I just had to number the records above based solely upon the date of birth (ignoring families) then I would use this query:
SELECT
     [PersonID]
    ,[FamilyID]
    ,[FirstName]
    ,[LastName]
    ,[DateOfBirth]
    ,ROW_NUMBER() over (ORDER BY DateOfBirth) AS Number
FROM
 People
ORDER BY
 PersonID
This just tells the ROW_NUMBER() function to order its numbering ascending by DateOfBirth. Notice that I apply an order myself later on in the query, which is different than the row_number() order. I would get these results:
PersonIDFamilyIDFirstNameLastNameDateOfBirthNumber
11JoeJohnson2000-10-23 13:00:003
21JimJohnson2001-12-15 05:45:005
32KarlyMatthews2000-05-20 04:00:001
42KacyMatthews2000-05-20 04:02:002
52TomMatthews2001-09-15 11:52:004
The number field that is assigned to each record is in the order of DateOfBirth.
Ordering my numbering by DateOfBirth is just half of the picture. I also need to “group” the records by the FamilyID. This is where a clause in T-, SQL that you might not be very familiar with comes into play: “PARTITION BY”. The PARTITION BY clause allows us to group the results within the call to ROW_NUMBER()without grouping them ourselves via a GROUP BY. It just tells the ROW_NUMBERR what groupings to use when it does its counting.
Here is our final SQL statement, which achieves the business logic we wanted to implement.
SELECT
       [PersonID]
     [FamilyID]
      ,[FirstName]
      ,[LastName]
      ,[DateOfBirth]
      ,ROW_NUMBER() over(PARTITION BY FamilyID,
                         CONVERT(NVARCHAR(25), DateOfBirth, 111)
                         ORDER BY DateOfBirth ASC) TwinCode

  FROM [People]
ORDER BY PersonID
IIn the ROW_NUMBER function above, I am doing several things. I’m grouping on FamilyID, and also grouping on a converted DateOfBirth. I convert the DateOfBirth to an nvarchar using the 111 conversion code, because that gets results like ‘2009/10/11′ and ‘2009/10/12′ which can easily be grouped by to achieve distinct dates.
Grouping on the Family, DateOfBirth, and then sorting by DateOfBirth ascending achieves the desired result for the ROW_NUMBER. Here are the results of the query:
PersonIDFamilyIDFirstNameLastNameDateOfBirthTwinCode
11JoeJohnson2000-10-23 13:00:001
21JimJohnson2001-12-15 05:45:001
32KarlyMatthews2000-05-20 04:00:001
42KacyMatthews2000-05-20 04:02:002
52TomMatthews2001-09-15 11:52:001
As you can see, the two people who qualify as twins above (Karly and Kacy) are enumerated correctly, with Karly receiving a 1 and Kacy receiving a 2. All the records that are not twins properly receive a 1.









Using ROW_NUMBER() to paginate your data with SQL Server 2005 and ASP.NET

By 20 Nov 2005
 
Sample Image - row_number.gif
View Demo

Introduction

One of the great weaknesses of SQL Server 2000 was its inability to handle data pagination well. The solution to the problem was always to select all of your results and programmatically hide the results you did not want to show. As the user clicked Next or Previous, you would again select all of the rows and only display what the user asked for. Though the desired interface is achieved through this method, it is terribly inefficient. Why couldn't we select only the data that we wanted? If you have ever had to write a search that used "like" against thousands of records, you know how terribly slow SQL Server 2000 could perform.
During my SQL Server 2000 days, I would search endlessly for a solution to this problem. I tried implementations of using a select top where my last row was greater than a parameter. This works only in some cases, like when ordering by a primary key or by date. Otherwise, this failed because of the existence of duplicate data. I also tried building stored procedures that used crazy for loops to try and accomplish this. In every instance, I would always hit a brick wall. The client would request a feature that I could not support with my method and I would always default back to the poor performance of selecting all of the rows (or many of them) and handling the paging scheme programmatically.
Throughout this process, I often theorized of a SQL Server function that could add a sequential row number to my result set and allow me to use a where clause against that row the only selects what rows I needed. After a bit of research, I found out that this function did in fact exist. The only problem was, it existed only in Oracle! I was enraged, how could something so useful be simply left out of SQL Server 2000?
A few years pass by and Microsoft releases .NET which offers a partial solution to the problem. ASP.NET offers you the ability to output-cache the results of your web control. So essentially, you can select all of the rows once and as you page through the results, pull each subsequent set from the cached results. This seems to partially solve the performance problem though you are still faced with making the initial selection. But what if you want to view live changing data? As you decrease your cache time, your performance gets worse, as you increase it, your data gets old. Eventually, you fall back on your tired old method again.
With the release of SQL Server 2005, Microsoft introduces the long overdue ROW_NUMBER() function to solve this problem. In this article, we will walk through a C# implementation of pagination using the ROW_NUMBER()method.

The Code

The first step is writing your stored procedure. The SQL code for using ROW_NUMBER() is not as intuitive as you might think. When I originally attempted to do this, I tried to simply use the ROW_NUMBER() function like I wouldnewid(). I quickly found out, that was not going to work. After some research, I came up with the stored procedure below. Though I would have rather seen a more intuitive syntax than what is below when you think about it, it does make sense. I suppose they did not want to hide logic from the programmer and ask him to accept that something magical simply happens. In the following project, I will use a database of all zip codes in the United States.
CREATE PROCEDURE [dbo].[sp_getzipcodes] 
     -- Add the parameters for the stored procedure here
     @start int = 0

AS
BEGIN
     -- SET NOCOUNT ON added to prevent extra result sets from
     -- interfering with SELECT statements.
     Set NOCOUNT ON
     SELECT TOP 20 * FROM 
     (
          SELECT zip,city,state,latitude,longitude,timezone,dst,
          ROW_NUMBER() OVER (ORDER BY zip) AS num
          FROM dbo.zipcode
     ) AS a
     WHERE num > @start
END
























No comments:

Post a Comment