Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Excel List of Duplicates and Cell Addresses within Workbook

Status
Not open for further replies.

PugnaxX

Technical User
Jan 11, 2006
37
US
Hi All,

I have several workbooks that are 100 worksheets or so and I would like to be able to have a macro that searches for duplicate data across worksheets and generates a list of the offending data values, along with their cell addresses. I have the following code from thread707-1379708 that WinblowsME came up with and I slightly modified. The problem with the code is that it seems to only search a single column on a single worksheet. I need it to search all columns on all worksheets to match "data1" which might be on sheet 1 column B with "data1" which might be on sheet 87 column D. Thanks for any input you may be able to muster!

-PugnaxX


Here's the code:

Sub Print_Duplicates()
Dim duplicates() As String, item As Variant, last_row As Long

'The following inserts a new column to the left of the existing A column for the list of duplicates to reside &
' selects the cell "A1" ("A1" is the "ActiveCell" now)
Columns("A:A").Select
Selection.Insert Shift:=xlToRight
Range("A1").Select
Range("A1").Value = ("DUPES")
Range("A2").Select

'I think this figures out what the last row with data is
last_row = ActiveSheet.Cells.Find(What:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByRows).Row

'This calls the function that finds the duplicates
Call Get_Duplicates(duplicates, 1, 2, last_row)

'This is my main modification, it sets the activecell's value to the duplicate item then moves down a row and sets
' that one equal to the next duplicate, ultimately creating a list of all duplicate values in your active worksheet
For Each item In duplicates
ActiveCell.Value = item
ActiveCell.Offset(1, 0).Select
Next
End Sub

Private Sub Get_Duplicates(ByRef duplicates() As String, first_row As Long, first_col As Long, last_row As Long)

'This is all code that I found on tek-tips.com, I'm not sure that I can decipher it line by line, but
'from what I can tell it is basically setting up a "dictionary" of all values in a column and then checking
'it for dupes
Dim i As Long, k As Long, curr_cell As String
Dim obj_all As Object, obj_duplicates As Object

Set obj_all = CreateObject("Scripting.Dictionary")
Set obj_duplicates = CreateObject("Scripting.Dictionary")

k = 0

For i = first_row To last_row
curr_cell = Trim(Cells(i, first_col))

If obj_all.exists(curr_cell) Then
If Not obj_duplicates.exists(curr_cell) Then
ReDim Preserve duplicates(k)
duplicates(k) = curr_cell
k = k + 1
obj_duplicates.Add curr_cell, ""
End If
Else
obj_all.Add curr_cell, ""
End If
Next i

obj_duplicates.RemoveAll
obj_all.RemoveAll

Set obj_duplicates = Nothing
Set obj_all = Nothing
End Sub

 
What is the logic behind having 100+ worksheets in a single workbook? That is, what criteria are you using to break up the data? Day/Week/Month? Employee ID/Name? Are the column headers the same (and in the same order) on each sheet within the workbook?

[tt]_____
[blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 

Create a new sheet named Summary:
[tt]
A B C D E
Sheet Range Value ValueList
[/tt]
Code:
Sub Main()
    Dim r As Range, rng As Range
    With Sheets("Summary")
        If .[E2].CurrentRegion.Rows.Count > 2 Then
            Set rng = .Range(.[E2], .[E2].End(xlDown))
        Else
            Set rng = .[E2]
        End If
        For Each r In rng
            FindWord r.Value
        Next
    End With
End Sub

Sub FindWord(sVal As String)
    Dim ws As Worksheet, sPrevAdd As String, rFound As Range, lRow As Long
    lRow = Sheets("Summary").[A1].CurrentRegion.Rows.Count + 1
    For Each ws In Worksheets
        Select Case ws.Name
            Case "Summary"
            Case Else
                Set rFound = ws.Cells.Find(sVal, SearchOrder:=xlByColumns)
                sPrevAdd = ""
                If Not rFound Is Nothing Then
                    Do
                        With Sheets("summary")
                            .Cells(lRow, "A").Value = rFound.Parent.Name
                            .Cells(lRow, "B").Value = rFound.Address
                            .Cells(lRow, "C").Value = rFound.Value
                        End With
                        sPrevAdd = rFound.Address
                        lRow = lRow + 1
                        Set rFound = ws.Cells.FindNext(rFound)
                    Loop Until rFound.Address <= sPrevAdd
                End If
        End Select
    Next
End Sub




Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
Hey Skip,

I've tried out the code and can't seem to get it to work for me. I created the worksheet "Summary" and ran the code (Sub Main). It gave me a list of Sheets and Ranges, but no values for the duplicates. The sheets and ranges that it gives seem random and refer to cells that are blank. I looked over the code trying to figure out if there was anything I could see and found that "summary" wasn't capatilized in the With Sheets statement, so I corrected that, but this feeble attempt yielded nothing.

Any thoughts? Thanks again!

-PugnaxX
 



Where is your SOURCE range?

Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
I'm sorry, I'm not sure if I follow...
 


Are you expecting that the procedure will start with the first cell on the first sheet, search for duplicates, then the second cell, search for duplicates... until all the cells on the first sheet have been used, then preceed to the second sheet.....

OR, do you have a LIST of values that you want to be checked? My process does the latter.

Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
The former is the desired behavior, "procedure will start with the first cell on the first sheet, search for duplicates, then the second cell, search for duplicates... until all the cells on the first sheet have been used, then preceed to the second sheet" and when it checks for duplicates for a cell, it needs to check the current sheet and all other sheets. Hope that's a little more clear...

Let me know if there any other information that would be helpful.

Thanks again!

-PugnaxX

 



Then run this first. It will build the ValueList of every unique cell value in the workbook...
Code:
Sub BuildList()
    Dim ws As Worksheet, r As Range, lRow As Long
    lRow = 2
    For Each ws In Worksheets
        Select Case ws.Name
            Case "Summary"
            Case Else
                For Each r In ws.UsedRange
                    If Sheets("summary").Columns(5).Find(r.Value) Is Nothing Then
                        Sheets("summary").Cells(lRow, "E").Value = r.Value
                        lRow = lRow + 1
                    End If
                Next
        End Select
    Next
End Sub

Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
Hey Skip,

I've been working with it, just testing it out and with that last bit of code you posted it's almost working! My test was to list the values 1,2,3,4 & 5 down column A on worksheet 1 and then 1,2,3,4 down the column A on worksheet 2. After I run BuildList, I have all unique values listed in column E of worksheet Summary. And then after I run Main, it lists the worksheet and range and value in columns A,B,C of Summary and shows the duplicates Sheet1 A1 1, Sheet A1 1, etc..., BUT it also lists 5 (with its sheet and range) which is not a duplicate! I'm attempting to get a list of just dupes.

Any ideas?

Thanks again for all of your help, you are great! You've really helped me learn a lot over the years, Skip, and I really appreciate all the help that you've afforded me.


-PugnaxX

 



Code:
            Case "Summary", "Sheet5", "any other sheets you dont want included"
            Case Else


Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 


BTW, BuildList is designed to put EVERYTING on the summary sheet.

Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
Right, Case "Summary" appears in FindWord and BuildList to exclude the Summary worksheet which is the only worksheet that I don't want included in the list of duplicates. I understand that BuildList creates basically a source range of all values for the Main sub to pick duplicates from and
I plan to add something like

Code:
-----------------------------------
Columns("E:E").Select

Selection.Delete Shift:=xlToLeft
------------------------------------
to the end of Main to do prevent the user from seeing the list of all values.


The problem I'm having is that after running BuildList, I run Main and then a list in columns A,B,C of "Duplicates Only" should appear. As it stands now it's listing everything (In my test case the 1,2,3,4 which each have a duplicate are listed, but the 5 which has NO duplicate is ALSO listed) for these "duplicates only" columns. Basically I'm having an issue within either the Main or FindWord function that is not allowing for the picking and displaying of only duplicates.

-PugnaxX
 


Do not move the list. Hide the SHEET or hide the COLUMN.

"Basically I'm having an issue within either the Main or FindWord function that is not allowing for the picking and displaying of only duplicates."

It's showing EVERY occurence. SO WHAT! Just pivot on the Value and Count of Value and ignore all the counts of 1. Don't make it harder than it needs to be.


Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
Thanks Skip, I'll do some research and figure out how to pivot and filter counts of 1 out and see what I can come up with.

It's been revealed to me that it's not all the data on all the worksheets I need to be worrying about so, my objective has now changed a bit.

I need to check all columns with a header that contains the word "pin" across all worksheets for duplicates (instead of checking everything against everything). So I am thinking I will use what you taught me with building a source column for Main to run on by searching each sheet's first row (header row) for "pin" and then copying the entire column into the E row of summary and then running Main and pivot.

Thanks again, I really appreciate all your time and effort!

PugnaxX
 




Here's a technique that I use.

1. Use the Find method to Set a range object looking for a HEADING VALUE in row 1, for instance

2. Use the same method look for a ROW VALUE in column A, for instance.

The ITNERSECTION is the DATA VALUE...
Code:
With Intersect(rFoundHeadVal.entirecolumn, rFoundRowVal.entirerow) 
  'do stuff with this cell reference
End with


Skip,
[sub]
[glasses]Did you hear what happened when the OO programmer lost his library?...
He's now living in OBJECT poverty![tongue][/sub]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top