Remove XLS duplicates but leave a fixed amount instead of just one 1

gizmo1973 · Jan 4, 2013

Okay, we all know in XLS you can easily remove duplicates by sorting and using the IF statement and in XLSX they've even added a little tool for us to do this but herein is my dilemma…

Both these operations assume I only want to be left with a clean list of data i.e. only one record from the many duplicate records whereas what I actually want to do is to be left with 10 of each set of duplicates

Data consists of Just two columns, column A is no duplicated and column B can be duplicated up to a 274 times so I need the first 10 of each set duplicates based on column B so I can load the results using column A into a control table.

Any ideas folks?

MS office professional plus 2010 on Windows 7

Please remember that with all my answers there most likely is a better way to achieve your results but I prefer the quick, simple and down right dirty approach!!

Regards, Phil.

http://www.mmse.co.uk

SkipVought · Jan 4, 2013

hi,

Data consists of Just two columns, column A is no duplicated and column B can be duplicated up to a 274 times

Can you post a small example of this condition? Might it be something like this...

Code:

Name    Age
Skip     22
Fred     22
Mary     22

where each NAME is unique but AGE has a value that is duplicated 3 times?

Skip,
[sub]
[glasses]

Just traded in my old subtlety...
for a NUANCE! [tongue]

[/sub]

gizmo1973 · Jan 7, 2013

Hi Skip,
Thanks for your reply, long time no speak!

What you say is correct and the data as below, I've left all four original columns in but am only working with cols A & D (as D is a concat of B&"_"&C)
I'd want the first 10 of the UNIT_ID of the first 19 based on the CONCAT then when the CONCAT shifts from A_4 to A_5 I'd want the next 10 of 14 A_5 etc and so on

UNIT_ID UNIT_CODE_1 UNIT_CODE_2 CONCAT
ABC123 A 4 A_4
ABC124 A 4 A_4
ABC125 A 4 A_4
ABC126 A 4 A_4
ABC127 A 4 A_4
ABC128 A 4 A_4
ABC129 A 4 A_4
ABC130 A 4 A_4
ABC131 A 4 A_4
ABC132 A 4 A_4
ABC133 A 4 A_4
ABC134 A 4 A_4
ABC135 A 4 A_4
ABC136 A 4 A_4
ABC137 A 4 A_4
ABC138 A 4 A_4
ABC139 A 4 A_4
ABC140 A 4 A_4
ABC141 A 4 A_4
ABC142 A 5 A_5
ABC143 A 5 A_5
ABC144 A 5 A_5
ABC145 A 5 A_5
ABC146 A 5 A_5
ABC147 A 5 A_5
ABC148 A 5 A_5
ABC149 A 5 A_5
ABC150 A 5 A_5
ABC151 A 5 A_5
ABC152 A 5 A_5
ABC153 A 5 A_5
ABC154 A 5 A_5
ABC155 A 5 A_5

Please remember that with all my answers there most likely is a better way to achieve your results but I prefer the quick, simple and down right dirty approach!!

Regards, Phil.

http://www.mmse.co.uk

Andrzejek · Jan 7, 2013

Would a VBA solution be OK?

Have fun.

---- Andy

SkipVought · Jan 7, 2013

So what result are you expecting? You already have a list unique values!

Please post an expected result based on your specific posted example.

Skip,
[sub]
[glasses]

Just traded in my old subtlety...
for a NUANCE! [tongue]

[/sub]

gizmo1973 · Jan 7, 2013

I would expect to be able to be left with 10 of each rows based on the duplicate CONCAT column

As you can see from the above there are multiple occurrences of the CONCAT column but the UNIT_ID column is not the same so they are not unique values for the row of data

I would expect to see the below from the above (it is in excel so looks different on here)

UNIT_ID UNIT_CODE_1 UNIT_CODE_2 CONCAT
ABC123 A 4 A_4
ABC124 A 4 A_4
ABC125 A 4 A_4
ABC126 A 4 A_4
ABC127 A 4 A_4
ABC128 A 4 A_4
ABC129 A 4 A_4
ABC130 A 4 A_4
ABC131 A 4 A_4
ABC132 A 4 A_4
ABC142 A 5 A_5
ABC143 A 5 A_5
ABC144 A 5 A_5
ABC145 A 5 A_5
ABC146 A 5 A_5
ABC147 A 5 A_5
ABC148 A 5 A_5
ABC149 A 5 A_5
ABC150 A 5 A_5
ABC151 A 5 A_5

As you can see now there are only 10 rows of data per occurrence of UNIT_ID and CONCAT column whereas in the original post there were more than 10 rows (this is a sample as requested, in the actual data set there are up to a 3,000 rows of data where the CONCAT column is duplicate but the UNIT_ID is unique but I need 10 of the occurrences to be a combination of both)

Please remember that with all my answers there most likely is a better way to achieve your results but I prefer the quick, simple and down right dirty approach!!

Regards, Phil.

http://www.mmse.co.uk

Andrzejek · Jan 7, 2013

You have not answer my question, but if the answer is Yes, try this:

Code:

Dim r As Integer
Dim x As Integer
Dim str As String

For r = 2 To 40 [green]'40 rows of data[/green]
    If str <> Range("D" & r).Value Then
        str = Range("D" & r).Value
        x = 1
    Else
        If x > 10 Then
            Range("A" & r & ":D" & r).Value = ""
        End If
        x = x + 1
    End If
Next r

You just need to get rid of empty rows

Assuming:
Column A is UNIT_ID
Column B is UNIT_CODE_1
Column C is UNIT_CODE_2
Column D is CONCAT

Have fun.

---- Andy

SkipVought · Jan 7, 2013

Why 10? Is that related to some logic within the data os something external to the data?

It seems that you would COPY the data and PASTE it 9 times.

Skip,
[sub]
[glasses]

Just traded in my old subtlety...
for a NUANCE! [tongue]

[/sub]

gizmo1973 · Jan 7, 2013

HI Andy,

Sorry I didn't reply however the reason I didn't was that a VBA solution wasn’t okay for me as I have no experience of anything more than very basic macros and I didn’t want you going off to do something that would work for someone experienced however thanks for your code but as I knew I would I have hit a snag with and don’t know where to go next because of my level of understanding of VBA…

I’ll just struggle on and potentially just use a random sort or just one of each dataset instead of ten, thanks to everyone for their help…

Please remember that with all my answers there most likely is a better way to achieve your results but I prefer the quick, simple and down right dirty approach!!

Regards, Phil.

http://www.mmse.co.uk

gizmo1973 · Jan 7, 2013

Skip,

It's something external to the data as the UNIT_ID is going to be linked back to another table to provide analysis
The UNIT_ID will be handled externally to provide algorithmic data and it is that data that ultimately is being tested it's just we need 10 of each record to test the algorithm

If we were to copy and paste nine times we’d have nine results for the same UNIT_ID whereas we want differing UNIT_IDs per test

Please remember that with all my answers there most likely is a better way to achieve your results but I prefer the quick, simple and down right dirty approach!!

Regards, Phil.

http://www.mmse.co.uk

SkipVought · Jan 7, 2013

I totally missed that you want to FIRST 10 in each CONCAT
[tt]
E2: =IF(D2=D1,E1+1,1)
[/tt]
where E1 is EMPTY
Then use the AutoFilter to show values in column E that are LESS THAN or EQUAL TO 10.

Code:

UNIT_ID UNIT_CODE_1 UNIT_CODE_2 CONCAT	
ABC123   A                    4 A_4     1
ABC124   A                    4 A_4     2
ABC125   A                    4 A_4     3
ABC126   A                    4 A_4     4
ABC127   A                    4 A_4     5
ABC128   A                    4 A_4     6
ABC129   A                    4 A_4     7
ABC130   A                    4 A_4     8
ABC131   A                    4 A_4     9
ABC132   A                    4 A_4    10
ABC142   A                    5 A_5     1

Skip,
[sub]
[glasses]

Just traded in my old subtlety...
for a NUANCE! [tongue]

[/sub]

gizmo1973 · Jan 7, 2013

It's all in the translation!
Cheers Skip, worked a treat

Please remember that with all my answers there most likely is a better way to achieve your results but I prefer the quick, simple and down right dirty approach!!

Regards, Phil.

http://www.mmse.co.uk

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Remove XLS duplicates but leave a fixed amount instead of just one 1

gizmo1973

MIS

SkipVought

Programmer

gizmo1973

MIS

Andrzejek

Programmer

SkipVought

Programmer

gizmo1973

MIS

Andrzejek

Programmer

SkipVought

Programmer

gizmo1973

MIS

gizmo1973

MIS

SkipVought

Programmer

gizmo1973

MIS

Similar threads

Part and Inventory Search

Sponsor