Microsoft Excel is a widely used application especially in Business Analysis, but it can be a little confusing when it comes to removing and eliminating duplicate data. Removing duplicates in Excel is a prevalent task for people working on huge datasets. When you combine different tables, or when many people have access to the same document, you might end up having repeated entries in your spreadsheet. Thus, making the data redundant. The larger the dataset, the higher are the chances of encountering duplicate records. It can be problematic if they are not identified and handled correctly.Â
We will illustrate how to remove duplicates in Excel using a Sports dataset. This dataset contains information about Olympic medalists for the year 2012.Â
Working with data in Excel often means dealing with duplicates Duplicates can clutter up your data and skew analysis, Knowing how to find and remove duplicate values is an essential Excel skill
In this comprehensive guide, we’ll explore several methods to identify and eliminate duplicate entries in Excel.
What are Duplicate Values in Excel?
A duplicate value is when the same data point occurs more than once in a dataset. For example, a list of customer names may unintentionally contain “John Smith” numerous times. Or a product list could have duplicate entries for “Blue Shirt” if data wasn’t cleaned properly.
Duplicate values can cause major issues:
- Inaccurate data analysis and metrics
- Formulas counting or summing duplicates incorrectly
- Difficulty sorting and filtering data
- Misleading data visualizations and reports
Why Remove Duplicate Values in Excel?
Eliminating duplicates leads to:
- Cleaner data for more accurate analysis
- Formulas that work as intended
- Easier filtering and sorting
- More reliable PivotTables and data visualizations
- Avoiding double-counting records
Deduplicating data ensures you have an accurate set of unique values for calculations, reporting, and presentations.
How to Find Duplicates in Excel
Let’s go through several methods to identity duplicate entries in Excel
Use Conditional Formatting
Conditional formatting allows highlighting duplicate values for easy identification.
- Select the dataset
- Go to Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values
- Pick a format like color fill or font color to highlight duplicates
This visually indicates which values occur more than once, You can then filter to only show the highlighted duplicates
Add a Frequency Column
You can also use a helper column to flag duplicates:
- Add column combining data from rows to create unique values
- Use COUNTIF or similar to count occurrences
- Values counted more than once are duplicates
Filter the helper column to view duplicates.
Use the Dupe Indicator Add-In
The Dupe Indicator add-in runs through your data and flags duplicates with a visual indicator icon.
It’s an automated way to identify duplicates without formulas.
Use Remove Duplicates Button
Excel has a built-in Remove Duplicates button that displays total unique and duplicate values when clicked.
To use:
- Select data range
- Data tab > Remove Duplicates
- View duplicate and unique count
This reveals the scale of the problem before deleting.
Copy to New Location
When applying Advanced Filter, you can copy unique records to a new location.
The number of rows copied indicates the unique value count.
Remove Duplicates in PivotTables
Add all fields to the Rows area of a PivotTable.
PivotTables automatically show unique values for Row fields, so this removes duplicates.
How to Delete Duplicate Values in Excel
Once found, here are ways to permanently remove duplicate entries:
Remove Duplicates Button
The Remove Duplicates button on the Data tab deletes duplicates it finds with one click.
Be sure to check the right columns to base removal on. This modifies data in-place.
Copy with Advanced Filter
When applying an Advanced Filter, choose “Copy to another location” to copy unique records to a new area.
Leave criteria blank and enable “Unique records only”.
Filter and Copy Visible
Manually filter to show only duplicate values, then copy the visible cells to a separate location.
Repeat by filtering unique values and copying those as well.
Use Power Query
Power Query has a dedicated “Remove Duplicates” button when editing a query. Specify one or more columns to base duplicates on.
The query steps remove duplicates during data refresh.
Copy to New Sheet
Similar to Advanced Filter, copy the entire dataset to a new sheet. Apply Remove Duplicates on the new sheet to eliminate duplicates.
This preserves the original dataset intact on the old sheet.
Delete Through VBA
VBA has a Range.RemoveDuplicates
method to programmatically strip duplicates from a defined range.
Set the relevant columns and header parameters. This permanently removes duplicates.
Remove Duplicates Based on One or More Columns
When finding and deleting duplicates, you must specify which columns to assess uniqueness on.
Comparing duplicates on just the Name column is different than comparing across Name, Email, and Phone together.
Here are ways to remove duplicates based on one or more columns:
- Hold CTRL and select columns in Remove Duplicates dialog
- Specify columns in RemoveDuplicates VBA method
- Set columns in Power Query
- Use 1 helper column vs multiple columns in formula
- Filter and copy from each column individually
Think carefully about which columns define a duplicate for your data requirements.
Remove Duplicate Rows vs Columns
The Remove Duplicates feature in Excel is designed for removing duplicate rows. All methods mentioned work across rows.
Removing duplicates in columns is harder because Excel expects unique values across rows by default.
To delete duplicate columns, transpose the data first so duplicate columns become duplicate rows. Remove Duplicates can then be applied normally.
Then transpose back to restore column orientation.
Find Duplicates Across Multiple Worksheets
If data is split across several worksheets in a workbook, you’ll need to consolidate to find duplicates.
One approach:
- Copy data from each sheet into a master sheet
- Add helper column with sheet name
- Remove duplicates on master sheet
Duplicate values will now be evident even if they exist on separate sheets.
Find Duplicates in Protected Sheets
Duplicates can’t be directly identified or removed on protected worksheets. But a couple workarounds exist:
- Copy data to an unprotected sheet, find/remove duplicates there
- Use VBA macro to bypass protection and interact with data
Work on a copy of protected data instead of modifying directly.
Avoid Duplicate Entries When Entering Data
Prevention is the best medicine. Many duplicates originate from manual data entry.
Ways to avoid entering duplicates:
- Use Data Validation dropdown lists
- Leverage spreadsheet templates
- Enforce entry requirements with forms/userforms
- Standardize inputs
- Implement data cleansing after entry
- Build in checks for existing values
Following best practices when entering data minimizes duplication.
Handle Duplicates Carefully
Some key points when managing duplicates:
- Work on a copy, not original data
- Check columns used to identify duplicates
- Understand impact on data integrity
- Eliminate accidental values, not intentional ones
- Double-check results before permanently deleting
A measured, methodical approach prevents accidental data loss when purging duplicates.
Automate Duplicate Removal
For frequent duplicate cleansing, automate the process with:
- Recorded macros to rerun removal
- Power Query queries refreshed automatically
- VBA procedures triggered by workbook events
- Excel tables auto-applying filters
Automation ensures deduplication occurs like clockwork.
Duplicates can sneak into any dataset, causing all sorts of issues. Mastering techniques to find and remove duplicate values in Excel is an indispensable skill for keeping data clean.
Follow the methods outlined here to hunt down and eliminate duplicate entries for smoother analysis and reporting.Work systematically to avoid unintended data loss. And implement steps to stop duplicates at the source.
With a sound duplicates strategy, you can banish messy data for good.
Using the Advanced Filter Option
The Advanced Filter option in Excel helps you filter duplicate values and copy the unique values to a different location. Look at the following steps to find out how the Advanced Filter option works.
- First, click on a cell or range in the dataset from which you want to remove duplicates. If you click on a single cell, Excel automatically determines the range when you click on Advanced Filter.
- Locate the Advanced Filter option.
DATA tab â Sort & Filter section â Advanced and click on it.
- Â You will see a dialog box. It consists of a list of options for advanced filtering.
- Select the âCopy to another locationâ option to copy the unique values to a different location.
- Check the range of your records in the âList Rangeâ field and make sure it is the range that you have specified.
- In the âCopy to:â field, enter the range where the resultant unique values must be copied.
- Check the âUnique records onlyâ box. This step is the most crucial.
- Click on OK.
- Â The unique values will be copied to cell G1.
These were the in-built functionalities in Excel that help us in removing duplicates. Now, letâs move forward and learn how we can create our own function to do the same.Â
Also Read: The Best Guide to Understand Advanced Excel Functions
Problems Removing Duplicates from Outlined or Subtotaled Data
A common problem when removing duplicates is that you cannot remove duplicate values from outlined or subtotaled data. To remove duplicates, you must remove both the outline and the subtotals. article>
Three EASY Ways to Find and Remove Duplicates in Excel
How to remove duplicates in Excel?
Select a cell inside the data which you want to remove duplicates from and go to the Data tab and click on the Remove Duplicates command. Excel will then select the entire set of data and open up the Remove Duplicates window. You then need to tell Excel if the data contains column headers in the first row.
How do I remove duplicates from Google Sheets?
The easiest way to remove duplicates in Google Sheets is to use Google’s built-in clean-up tools. Select the data range in your Google spreadsheet. In the Menu, go to Data > Data clean-up > Remove Duplicates. Tick the Data has header row checkbox if you have a header row. Tick both checkboxes under Columns to analyze. Click Remove duplicates.
How do I check for duplicate values in Excel?
Select the cells you want to check for duplicates. Note: Excel can’t highlight duplicates in the Values area of a PivotTable report. Select Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values. In the box next to values with, pick the formatting you want to apply to the duplicate values, and then select OK.
How to find and highlight duplicate data in Excel?
Use conditional formatting to find and highlight duplicate data. That way you can review the duplicates and decide if you want to remove them. Select the cells you want to check for duplicates. Note: Excel can’t highlight duplicates in the Values area of a PivotTable report.