Data Analysis: Identifying Duplicate Rows in Excel


In Excel, there is funciton to remove duplicates; however, didn't find direct method to identifying duplicates based on specific columns.

Below is step created to find out duplicate ID columns for further analysis:

Column A contains all Customer IDs. We need to identify how many rows are generated for a customer id.

Step 1: Sort based on Column A
Step 2: Add 3 Columns with below calculations

=COUNT(MATCH(A10,A11,0)) as B10
=COUNT(MATCH(A10,A9,0))  as C10
=IF(B10=1,1,IF(C10=1,1,0)) or B10+C10  as D10

Now You can use D10 to filter on 1 to list down all duplicate records for further analysis for reason of what values are different or any other purpose.


Popular posts from this blog

Cognos Excel Text Wrapping Issue - Row Merge and Column Merge

Cognos: How to replace an existing report without breaking links?

How to change package for a report in IBM Cognos Analytics (ICA)?