Selecting Non-Duplicate Rows from a Table Using ROW_NUMBER in SQL Server

Understanding and Implementing Rownumber to Select Non-Duplicate Rows from a Table

In this article, we will explore how to use the ROW_NUMBER function in SQL Server to select non-duplicate rows from a table. We will also discuss the error that occurs when trying to calculate date difference between two dates of different data types.

Introduction

The ROW_NUMBER function is used to assign a unique number to each row within a partition of a result set. It can be used in combination with the PARTITION BY clause to identify rows that are identical except for their values at certain columns.

In this article, we will use an example table named Emp_demo3 which contains employee information. The goal is to select non-duplicate rows from this table based on specific columns and then calculate the date difference between two dates.

Understanding the Problem

When we try to calculate the date difference in days between two dates, we encounter an error because SQL Server does not allow us to directly subtract one date type from another. This is a fundamental limitation of SQL Server’s data types.

The solution lies in transforming our date fields into a compatible format before performing the calculation.

Creating the Table and Data

First, let’s create the Emp_demo3 table and insert some sample data.

CREATE TABLE Emp_demo3 (
    emp_ID INT,
    emp_Name NVARCHAR (50),
    emp_sal_K INT,
    emp_manager INT,
    joining_date DATE,
    last_time DATE) 

INSERT INTO Emp_demo3 VALUES (1,'Ali', 200,2,'2010-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (2,'Zaid', 770,4,'2008-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (3,'Mohd', 1140,2,'2007-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (4,'LILY', 770,Null,'2013-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (5,'John', 1240,6,'2016-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (6,'Mike', 1140,4,'2018-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (5,'John', 1240,6,'2017-01-28','2015-05-09')
INSERT INTO Emp_demo3 VALUES (3,'Mohd', 1140,2,'2010-01-28','2015-05-09')

Calculating Date Difference

We will use the DATEDIFF function to calculate the date difference in days between two dates.

ALTER TABLE Emp_demo3 
add date_diff DATE

UPDATE Emp_demo3 
SET date_diff = DATEDIFF(DAY, joining_date, last_time)

However, when we try to execute this code, we get an error message: “Operand type clash: int is incompatible with date”.

This is because the DATEDIFF function returns an integer value representing the number of days between two dates.

To solve this problem, we need to transform our date fields into a compatible format before performing the calculation.

Using ROW_NUMBER

We can use the ROW_NUMBER function in combination with the PARTITION BY clause to identify rows that are identical except for their values at certain columns.

Here’s how we can do it:

DECLARE  @Emp_demo2 TABLE (
            emp_ID INT,
            emp_Name NVARCHAR (50),
            emp_sal_K INT,
            emp_manager INT)         

INSERT INTO @Emp_demo2 VALUES (1,'Ali', 200,2)
INSERT INTO @Emp Demo2 VALUES (2,'Zaid', 770,4)
INSERT INTO @Emp_demo2 VALUES (3,'Mohd', 1140,2)
INSERT INTO @Emp_demo2 VALUES (4,'LILY', 770,Null)
INSERT INTO @Emp Demo2 VALUES (5,'John', 1240,6)
INSERT INTO @Emp Demo2 VALUES (6,'Mike', 1140,4)
INSERT INTO @Emp Demo2 VALUES (5,'John', 1240,6)
INSERT INTO @Emp Demo2 VALUES (3,'Mohd', 1140,2)


SELECT * FROM 
(
    SELECT 
     t.emp_ID
    , t.emp_Name
    , t.emp_sal_K
    , t.emp_manager
    , ROW_NUMBER() OVER (PARTITION BY t.emp_Name, t.emp_sal_K, t.emp_manager
        ORDER BY t.emp_Name) AS RowNum
    FROM @Emp Demo2 AS t
)q
WHERE q.RowNum = 1
ORDER BY q.emp_ID

In this code snippet, the ROW_NUMBER function is used to assign a unique number to each row within a partition of the result set. The PARTITION BY clause identifies rows that are identical except for their values at certain columns (emp_Name, emp_sal_K, and emp_manager).


Last modified on 2024-08-04