Duplicate records in a SQL Server table can cause performance issues, inaccurate reports, and data integrity problems. Whether the duplicates were introduced due to application bugs, ETL processes, or human error, cleaning them up is essential for maintaining a healthy database.
In this guide, we will explore multiple methods to identify and delete duplicate records in SQL Server, ensuring your data remains clean and efficient.
Table of Contents for Delete Duplicate Records
Why Are Duplicate Records a Problem?
Before diving into the solutions, let’s understand why duplicates are harmful:
Wasted Storage: Duplicate rows consume unnecessary disk space.
Performance Degradation: Queries run slower when scanning redundant data.
Inaccurate Analytics: Reports and aggregations may produce incorrect results.
Data Integrity Issues: Updates or deletions may affect only some copies, leading to inconsistencies.
Now, let’s explore how to find and remove duplicates efficiently.
Method 1: Using GROUP BY and HAVING to Identify and Delete Duplicate Records
The simplest way to detect duplicates is by using GROUP BY with HAVING COUNT(*) > 1.
Step 1: Find Duplicate Records

SELECT
column1,
column2,
COUNT(*) AS duplicate_count
FROM
YourTableName
GROUP BY
column1, column2
HAVING
COUNT(*) > 1;
This query groups rows by specified columns and returns only those with duplicates.
Step 2: Delete Duplicates Records Using a Common Table Expression (CTE)
To safely delete duplicates, we will use a CTE with ROW_NUMBER()
WITH CTE AS (
SELECT
column1,
column2,
ROW_NUMBER() OVER (
PARTITION BY column1, column2
ORDER BY (SELECT NULL)
) AS row_num
FROM
YourTableName
)
DELETE FROM CTE
WHERE row_num > 1;
This keeps one copy of each duplicate and removes the rest.
Method 2: Using ROW_NUMBER() to Delete Duplicates Records
This method is efficient for large tables.
Step 1: Assign Row Numbers to Each Record
WITH CTE AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY column1, column2
ORDER BY ID -- Use a unique column (e.g., ID) to determine which row to keep
) AS row_num
FROM
YourTableName
)
SELECT * FROM CTE
WHERE row_num > 1; -- Check which rows will be deleted
Step 2: Delete Duplicates Records
WITH CTE AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY column1, column2
ORDER BY ID
) AS row_num
FROM
YourTableName
)
DELETE FROM CTE
WHERE row_num > 1;
This ensures only the first occurrence (based on ORDER BY) is retained.
Method 3: Using DISTINCT with a Temporary Table to indentify and Delete Duplicate Records
If the table doesn’t have a unique key, you can use a temporary table.
Step 1: Copy Distinct Records to a Temp Table
SELECT DISTINCT * INTO #TempTable
FROM YourTableName;
Step 2: Truncate the Original Table
TRUNCATE TABLE YourTableName;
Step 3: Reinsert Distinct Records
INSERT INTO YourTableName
SELECT * FROM #TempTable;
Step 4: Drop the Temp Table
DROP TABLE #TempTable;
This method is simple but not transaction-safe use it with caution in production.
Method 4: Using DELETE with Self-Join
For more control, use a self-join to delete duplicates.
Step 1: Identify Duplicates with a Self-Join
DELETE t1
FROM YourTableName t1
INNER JOIN YourTableName t2
ON t1.column1 = t2.column1
AND t1.column2 = t2.column2
AND t1.ID > t2.ID; -- Keeps the record with the smallest ID
This query deletes duplicates while preserving the oldest record (assuming ID is incremental).
Best Practices to Avoid Future Duplicates
Use UNIQUE Constraints
ALTER TABLE YourTableName
ADD CONSTRAINT UQ_Column UNIQUE (column1, column2);
Implement Proper Indexing
CREATE INDEX IX_YourTableName_Columns ON YourTableName (column1, column2);
Use MERGE Instead of INSERT for Upserts
MERGE INTO YourTableName AS target
USING SourceTable AS source
ON target.key_column = source.key_column
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...;
Regular Data Cleanup Jobs
Schedule periodic checks to remove duplicates automatically.
Conclusion Delete Duplicate Records
Deleting duplicate records in SQL Server is crucial for database efficiency. Depending on your table structure, you can use:
1. GROUP BY + HAVING for quick identification
2. ROW_NUMBER() + CTE for precise deletion
3. DISTINCT + Temp Table for simple cleanup
4. Self-Join DELETE for conditional removal
By following these methods and applying preventive measures, you can ensure clean, duplicate-free data in your SQL Server tables.
FAQs
Q1: Will these methods work on large tables?
Yes, but ROW_NUMBER() is the most efficient for big datasets.
Q2: How can I backup before deleting duplicates?
SELECT * INTO Backup_YourTableName FROM YourTableName;
Q3: Can I automate duplicate removal?
Yes, create a stored procedure and schedule it via SQL Agent.
