How to Delete Duplicate Records from a SQL Server Table – A Step-by-Step 4 Methods

Duplicate records in a SQL Server table can cause performance issues, inaccurate reports, and data integrity problems. Whether the duplicates were introduced due to application bugs, ETL processes, or human error, cleaning them up is essential for maintaining a healthy database.

In this guide, we will explore multiple methods to identify and delete duplicate records in SQL Server, ensuring your data remains clean and efficient.

Why Are Duplicate Records a Problem?

Before diving into the solutions, let’s understand why duplicates are harmful:

Wasted Storage: Duplicate rows consume unnecessary disk space.

Performance Degradation: Queries run slower when scanning redundant data.

Inaccurate Analytics: Reports and aggregations may produce incorrect results.

Data Integrity Issues: Updates or deletions may affect only some copies, leading to inconsistencies.

Now, let’s explore how to find and remove duplicates efficiently.

Method 1: Using GROUP BY and HAVING to Identify and Delete Duplicate Records

The simplest way to detect duplicates is by using GROUP BY with HAVING COUNT(*) > 1.

Step 1: Find Duplicate Records

Delete Dublicate Records
SELECT 
    column1, 
    column2, 
    COUNT(*) AS duplicate_count
FROM 
    YourTableName
GROUP BY 
    column1, column2
HAVING 
    COUNT(*) > 1;

This query groups rows by specified columns and returns only those with duplicates.

Step 2: Delete Duplicates Records Using a Common Table Expression (CTE)

To safely delete duplicates, we will use a CTE with ROW_NUMBER()

WITH CTE AS (
    SELECT 
        column1, 
        column2,
        ROW_NUMBER() OVER (
            PARTITION BY column1, column2 
            ORDER BY (SELECT NULL)
        ) AS row_num
    FROM 
        YourTableName
)
DELETE FROM CTE 
WHERE row_num > 1;

This keeps one copy of each duplicate and removes the rest.

Method 2: Using ROW_NUMBER() to Delete Duplicates Records

This method is efficient for large tables.

Step 1: Assign Row Numbers to Each Record

WITH CTE AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (
            PARTITION BY column1, column2 
            ORDER BY ID  -- Use a unique column (e.g., ID) to determine which row to keep
        ) AS row_num
    FROM 
        YourTableName
)
SELECT * FROM CTE 
WHERE row_num > 1;  -- Check which rows will be deleted

Step 2: Delete Duplicates Records


WITH CTE AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (
            PARTITION BY column1, column2 
            ORDER BY ID
        ) AS row_num
    FROM 
        YourTableName
)
DELETE FROM CTE 
WHERE row_num > 1;

This ensures only the first occurrence (based on ORDER BY) is retained.

Method 3: Using DISTINCT with a Temporary Table to indentify and Delete Duplicate Records

If the table doesn’t have a unique key, you can use a temporary table.

Step 1: Copy Distinct Records to a Temp Table

SELECT DISTINCT * INTO #TempTable 
FROM YourTableName;

Step 2: Truncate the Original Table

TRUNCATE TABLE YourTableName;

Step 3: Reinsert Distinct Records

INSERT INTO YourTableName
SELECT * FROM #TempTable;

Step 4: Drop the Temp Table

DROP TABLE #TempTable;

This method is simple but not transaction-safe use it with caution in production.

Method 4: Using DELETE with Self-Join

For more control, use a self-join to delete duplicates.

Step 1: Identify Duplicates with a Self-Join

DELETE t1
FROM YourTableName t1
INNER JOIN YourTableName t2 
ON t1.column1 = t2.column1 
AND t1.column2 = t2.column2
AND t1.ID > t2.ID;  -- Keeps the record with the smallest ID

This query deletes duplicates while preserving the oldest record (assuming ID is incremental).

Best Practices to Avoid Future Duplicates

Use UNIQUE Constraints

ALTER TABLE YourTableName 
ADD CONSTRAINT UQ_Column UNIQUE (column1, column2);

Implement Proper Indexing

CREATE INDEX IX_YourTableName_Columns ON YourTableName (column1, column2);

Use MERGE Instead of INSERT for Upserts

MERGE INTO YourTableName AS target
USING SourceTable AS source
ON target.key_column = source.key_column
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...;

Regular Data Cleanup Jobs

Schedule periodic checks to remove duplicates automatically.

Conclusion Delete Duplicate Records

Deleting duplicate records in SQL Server is crucial for database efficiency. Depending on your table structure, you can use:

1. GROUP BY + HAVING for quick identification
2. ROW_NUMBER() + CTE for precise deletion
3. DISTINCT + Temp Table for simple cleanup
4. Self-Join DELETE for conditional removal

By following these methods and applying preventive measures, you can ensure clean, duplicate-free data in your SQL Server tables.

FAQs

Q1: Will these methods work on large tables?
Yes, but ROW_NUMBER() is the most efficient for big datasets.

Q2: How can I backup before deleting duplicates?

SELECT * INTO Backup_YourTableName FROM YourTableName;

Q3: Can I automate duplicate removal?
Yes, create a stored procedure and schedule it via SQL Agent.

Delete Duplicate Records

Leave a Comment