I had a situation recently where I had a table full of people records, where the people were divided into families. The business logic that needed to be followed was that I had to assign a “Twin Code” to each record. This meant that for each family in the database, if two or more members were born on the same day they should be treated as twins. The twins should be assigned a number enumerating them in order of birth. If the member was not a twin, they should just receive the twin code of 1.
Here’s an example table:
PersonID | FamilyID | FirstName | LastName | DateOfBirth |
1 | 1 | Joe | Johnson | 2000-10-23 13:00:00 |
2 | 1 | Jim | Johnson | 2001-12-15 05:45:00 |
3 | 2 | Karly | Matthews | 2000-05-20 04:00:00 |
4 | 2 | Kacy | Matthews | 2000-05-20 04:02:00 |
5 | 2 | Tom | Matthews | 2001-09-15 11:52:00 |
There are lots of ways to achieve the desired result, but the simplest is to just use a simple
SELECT
statement combined with the
ROW_NUMBER()
function with a couple parameters as to how to number the rows!
ROW_NUMBER()
provides you with the number of a row in a given recordset, where you provide details on how to number the records. For example, if I just had to number the records above based solely upon the date of birth (ignoring families) then I would use this query:
Collapse | Copy Code
SELECT
[PersonID]
,[FamilyID]
,[FirstName]
,[LastName]
,[DateOfBirth]
,ROW_NUMBER() over (ORDER BY DateOfBirth) AS Number
FROM
People
ORDER BY
PersonID
This just tells the
ROW_NUMBER()
function to order its numbering ascending by DateOfBirth. Notice that I apply an order myself later on in the query, which is different than the
row_number()
order. I would get these results:
PersonID | FamilyID | FirstName | LastName | DateOfBirth | Number |
1 | 1 | Joe | Johnson | 2000-10-23 13:00:00 | 3 |
2 | 1 | Jim | Johnson | 2001-12-15 05:45:00 | 5 |
3 | 2 | Karly | Matthews | 2000-05-20 04:00:00 | 1 |
4 | 2 | Kacy | Matthews | 2000-05-20 04:02:00 | 2 |
5 | 2 | Tom | Matthews | 2001-09-15 11:52:00 | 4 |
The number field that is assigned to each record is in the order of DateOfBirth.
Ordering my numbering by DateOfBirth is just half of the picture. I also need to “group” the records by the FamilyID. This is where a clause in T-,
SQL that you might not be very familiar with comes into play: “
PARTITION BY
”. The PARTITION BY clause allows us to group the results within the call to
ROW_NUMBER()
without grouping them ourselves via a
GROUP BY
. It just tells the
ROW_NUMBERR
what groupings to use when it does its counting.
Here is our final
SQL statement, which achieves the business logic we wanted to implement.
Collapse | Copy Code
SELECT
[PersonID]
[FamilyID]
,[FirstName]
,[LastName]
,[DateOfBirth]
,ROW_NUMBER() over(PARTITION BY FamilyID,
CONVERT(NVARCHAR(25), DateOfBirth, 111)
ORDER BY DateOfBirth ASC) TwinCode
FROM [People]
ORDER BY PersonID
IIn the
ROW_NUMBER
function above, I am doing several things. I’m grouping on FamilyID, and also grouping on a converted DateOfBirth. I convert the DateOfBirth to an
nvarchar
using the 111 conversion code, because that gets results like ‘2009/10/11′ and ‘2009/10/12′ which can easily be grouped by to achieve distinct dates.
Grouping on the Family, DateOfBirth, and then sorting by DateOfBirth ascending achieves the desired result for the
ROW_NUMBER
. Here are the results of the query:
PersonID | FamilyID | FirstName | LastName | DateOfBirth | TwinCode |
1 | 1 | Joe | Johnson | 2000-10-23 13:00:00 | 1 |
2 | 1 | Jim | Johnson | 2001-12-15 05:45:00 | 1 |
3 | 2 | Karly | Matthews | 2000-05-20 04:00:00 | 1 |
4 | 2 | Kacy | Matthews | 2000-05-20 04:02:00 | 2 |
5 | 2 | Tom | Matthews | 2001-09-15 11:52:00 | 1 |
As you can see, the two people who qualify as twins above (Karly and Kacy) are enumerated correctly, with Karly receiving a 1 and Kacy receiving a 2. All the records that are not twins properly receive a 1.
No comments:
Post a Comment