Deletetion of duplicate lines with matching part of line

ok1397 · Oct 9, 2007

Hello everyone, need some help with a text file. I need to check the first 40 characters of each line and if they match delete the previous line. The problem is that the lines with matching first 40 characters are not one after another. Is this possible? Any help will be greatly appreciated...

johnwm · Oct 9, 2007

Your question is a bit vague. Suggest you read faq222-2244 to see how to clarify your question, and how the forum is supposed to work. You should especially read paragraphs 8,9 10, 14, 15 and 16.

What have you got so far?

Which bit are you stuck on?

___________________________________________________________
If you want the best response to a question, please check out FAQ222-2244 first.
'If we're supposed to work in Hex, why have we only got A fingers?'
Drive a Steam Roller
Steam Engine Prints

BobRodes · Oct 10, 2007

You might consider reading the text file into a recordsetn (use ADO). A recordset will give you a movable pointer to a given line of text.

ok1397 · Oct 11, 2007

BobRodes, thank for responding, i'll give it a try. Let you know how it works out.. Thank you.

Swi · Oct 11, 2007

It sounds like you want to get rid of all duplicates. The reason I say this is because if the dupes are not in order what reason would you have to delete the previous line unless I am missing something. If you are looking for a comparison tool for checking dupes look into the dictionary object and the .Exists method. However, as johnwm states, post what you have so we can get a better idea of your goal.

Swi

ok1397 · Oct 11, 2007

Hi Swi, i do want to delete all duplicates if they match in a text file (see below example) the first 20 characters. Will the dictionary object and the .Exists method work?

this what it looks like, in this example we are comparing the first 20 characters: ABC123 TOTAL TO ORDER:
and DEF456 TOTAL TO ORDER:

ABC123 TOTAL TO ORDER: 0 ;delete the first line
dfkjdslkfldkflkjlkjklkjlkklj
ABC123 TOTAL TO ORDER: 0
sdkfjksdflkdsjflkdjflkdjfkldj
DEF456 TOTAL TO ORDER: 0 ;delete the first line
sdkfjldkfjldkjkjkjasd
DEF456 TOTAL TO ORDER: 50

JerryKlmns · Oct 11, 2007

Here is a crazy idea

Create a disconnected recordset with ADO, having two fileds. The first field should hold the 20chars length value and the second field the rest you want.

Open the file, read the first line to the recordset. Until the end of file, read line, filter the recordset with the 20chars value of that line. If you have a record, update the 2nd 's field value, else add a new record.

When done with the file, move to the first record and open a new file to output results held in the recordset.

Simple?

Swi · Oct 11, 2007

It could be used. JerryKlmns also has an interesting idea.

Swi

ok1397 · Oct 11, 2007

I tried using dictionary object, works great but is deleting the second line, not the first line as i need it to. Here's the code i have so far:

Private Sub Command1_Click()
Dim Dict As Dictionary
Dim InputData As String
Dim Counter As Long
Dim UniqueCounter As Long

Open App.Path & "\MWGLBRPT.TXT" For Input As #1
Open App.Path & "\GLBRPT.TXT" For Output As #2

Set Dict = New Dictionary
Counter = 0
UniqueCounter = 0
'PreviousLine = InputData

Line Input #1, InputData
Dict.Add Left$(InputData, 20), UniqueCounter

' Loops through the file
Do
Line Input #1, InputData
Counter = Counter + 1
' Finds mismatches and adds to dictionary and output file if unique
If Dict.Exists(Left$(InputData, 20)) = False Then
UniqueCounter = UniqueCounter + 1
Print #2, InputData
Dict.Add Left$(InputData, 20), UniqueCounter
End If

Loop Until EOF(1)

Close
Set Dict = Nothing

End Sub

Swi · Oct 11, 2007

I am a little unclear on what you want to accomplish. Can you post an example of what the input would look like and what the output should look like? Thanks.

Swi

ok1397 · Oct 11, 2007

SWI, text file looks like this

12LDF { ON HAND= 1,947 } { QTY ALOC= 864 } { QTY AVAIL= 1,083 } { WH1 AVG= 2,573 }
12LDF { ON HAND= 992 } { QTY ALOC= 0 } { QTY AVAIL= 992 } { WH2 AVG= 402 }
12LDF { ON HAND= 1,599 } { QTY ALOC= 32 } { QTY AVAIL= 1,567 } { WH3 AVG= 839 }
12LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 1 SUGGESTED TO ORDER= 1699
12LDF TOTAL TO ORDER: 1699 'NEED TO DELETE THIS LINE
12LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 0 SUGGESTED TO ORDER= 11209
12LDF TOTAL TO ORDER: 12908

13LDF { ON HAND= 3,648 } { QTY ALOC= 640 } { QTY AVAIL= 3,008 } { WH1 AVG= 597 }
13LDF { ON HAND= 1,296 } { QTY ALOC= 0 } { QTY AVAIL= 1,296 } { WH2 AVG= 294 }
13LDF { ON HAND= 2,018 } { QTY ALOC= 40 } { QTY AVAIL= 1,978 } { WH3 AVG= 305 }
13LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 7 SUGGESTED TO ORDER= 0
13LDF TOTAL TO ORDER: 0 'NEED TO DELETE THIS LINE
13LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 5 SUGGESTED TO ORDER= 0
13LDF TOTAL TO ORDER: 0

WITH THE CODE I CURRENTLY HAVE I GET THE FOLLOWING RESULTS:

12LDF { ON HAND= 1,947 } { QTY ALOC= 864 } { QTY AVAIL= 1,083 } { WH1 AVG= 2,573 }
12LDF { ON HAND= 992 } { QTY ALOC= 0 } { QTY AVAIL= 992 } { WH2 AVG= 402 }
12LDF { ON HAND= 1,599 } { QTY ALOC= 32 } { QTY AVAIL= 1,567 } { WH3 AVG= 839 }
12LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 1 SUGGESTED TO ORDER= 1699
12LDF TOTAL TO ORDER: 1699 'NEED TO DELETE THIS LINE
12LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 0 SUGGESTED TO ORDER= 11209

13LDF { ON HAND= 3,648 } { QTY ALOC= 640 } { QTY AVAIL= 3,008 } { WH1 AVG= 597 }
13LDF { ON HAND= 1,296 } { QTY ALOC= 0 } { QTY AVAIL= 1,296 } { WH2 AVG= 294 }
13LDF { ON HAND= 2,018 } { QTY ALOC= 40 } { QTY AVAIL= 1,978 } { WH3 AVG= 305 }
13LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 7 SUGGESTED TO ORDER= 0
13LDF TOTAL TO ORDER: 0 'NEED TO DELETE THIS LINE
13LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 5 SUGGESTED TO ORDER= 0

I JUST NEED TO KEEP THE 2nd DUPLICATE AND THE DELETE THE 1st ONE. THANK YOU

BobRodes · Oct 11, 2007

So, you need to delete all lines with "TOTAL TO ORDER" in them, and also the first version of "13LDF" in this example, presumably because it's older data than the second one?

If so, you can leverage a few characteristics of your data. First, it would seem that each individual record has 6 lines of data. Pull everything into a recordset, putting the first six characters in a separate field. Add another field that numbers all the records in the order they came. Apply a filter that keeps only those records whose key has a count of greater than 6. Sort the result by the key descending, secondarily by the number field descending. Iterate through the recordset, deleting all records for a given key after the count of 6 has been reached. Then delete the "total to order" line.

HTH

Bob

Swi · Oct 11, 2007

Now that I understand your goal I would recommend that you try what BobRodes has described above.

Swi

ok1397 · Oct 11, 2007

Swi, i agree, i will give it shot tonite !!! thank you BobRodes, and everyone for your help !!!!!!!!!!!

BobRodes · Oct 11, 2007

Post back if you get stuck.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Deletetion of duplicate lines with matching part of line

ok1397

Technical User

johnwm

Programmer

BobRodes

Instructor

ok1397

Technical User

Swi

Programmer

ok1397

Technical User

JerryKlmns

IS-IT--Management

Swi

Programmer

ok1397

Technical User

Swi

Programmer

ok1397

Technical User

BobRodes

Instructor

Swi

Programmer

ok1397

Technical User

BobRodes

Instructor

Similar threads

Part and Inventory Search

Sponsor