Quantcast
Channel: Sql And Me » Common Table Expressions
Viewing all articles
Browse latest Browse all 6

SQL Server – Identifying unique and duplicate rows in a table

$
0
0

Finding duplicate rows in a table can be done easily by using ROW_NUMBER() function. You can number each similar rows by using PARTITION BY clause.

For example, consider the below table:

EmployeeID  FirstName      MiddleName     LastName

———– ———-     ———–    ———

1           Ken            J              Sánchez

2           Terri          Lee            Duffy

3           Gail           A              Erickson

4           Ken            J              Sánchez

5           Terri          Lee            Duffy

6           Gail           A              Erickson

 

(6 row(s) affected)

Selecting using DISTINCT does not help here, as we have an identity column [EmployeeID] which id unique for all rows. To find duplicates here we need to compare only common columns [FirstName], [MiddleName] and [LastName]. We can partition data based on these columns for row numbering as below:

SELECT      EmployeeID, FirstName, MiddleName, LastName,

            ROW_NUMBER() OVER

            (PARTITION BY FirstName, MiddleName, LastName

             ORDER BY EmployeeID)

            AS 'Row Number'

FROM        dbo.Table_Employees

Result Set:

EmployeeID  FirstName      MiddleName     LastName     Row Number

———– ———-     ———–    ———    ———–

3           Gail           A              Erickson     1

6           Gail           A              Erickson     2

1           Ken            J              Sánchez      1

4           Ken            J              Sánchez      2

2           Terri          Lee            Duffy        1

5           Terri          Lee            Duffy        2

 

(6 row(s) affected)

Now we have both unique and duplicate rows from dbo.Table_Employees. All rows with [Row Number] = 1 are unique, and all other rows are duplicate rows. Still we cannot identify that if a row is duplicate, which is a corresponding unique row for the same (except by looking at it :) , which may not be feasible if there are a large number of rows).

 

To easily identify a unique row corresponding to a duplicate row, we would like to add another column which gives us the [EmployeeID] of the unique row. In the above example, row with [EmployeeID] = 6 is duplicate row, and we would like to add a column which contains value 3 ([EmployeeID] of corresponding unique row). So the row should look like below:

EmployeeID  FirstName      MiddleName     LastName     Row Number     Duplicate Of

———– ———-     ———–    ———    ———–    ————-

6           Gail           A              Erickson     2              3

 

This can be achieved by:

 

1. Move all unique rows to TableA (all rows where [RowNumber] = 1),

2. Move all duplicate rows to TableB (all rows where [RowNumber] <> 1),

3. JOIN TableA with TableB to get value for [Duplicate Of].

 

We don’t need to create actual separate tables here, we can use CTE instead. Final query will be as follows:

 

WITH  CTE_Employees

      (EmployeeID, FirstName, MiddleName, LastName, RowNumber)

AS

(

      SELECT      EmployeeID, FirstName, MiddleName, LastName,

                  ROW_NUMBER() OVER

                  (PARTITION BY FirstName, MiddleName, LastName

                  ORDER BY EmployeeID)

      FROM        Table_Employees

)

SELECT      TableB.EmployeeID, TableB.FirstName, TableB.MiddleName,

            TableB.LastName, TableA.EmployeeID AS 'Duplicate Of'

FROM        CTE_Employees TableB

INNER JOIN CTE_Employees TableA

            ON    TableB.FirstName  = TableA.FirstName

            AND   TableB.MiddleName = TableA.MiddleName

            AND   TableB.LastName   = TableA.LastName

WHERE       TableB.RowNumber > 1 – Duplicate rows

AND         TableA.RowNumber = 1 – Unique rows

Result Set:

EmployeeID  FirstName      MiddleName     LastName      Duplicate Of

———– ———-     ———–    ———     ————-

6           Gail           A              Erickson      3

4           Ken            J              Sánchez       1

5           Terri          Lee            Duffy         2

 

(3 row(s) affected)

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe


Filed under: Common Table Expressions, SQLServer

Viewing all articles
Browse latest Browse all 6

Latest Images

Trending Articles





Latest Images