Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Deletetion of duplicate lines with matching part of line

Status
Not open for further replies.

ok1397

Technical User
Feb 7, 2005
70
0
0
US
Hello everyone, need some help with a text file. I need to check the first 40 characters of each line and if they match delete the previous line. The problem is that the lines with matching first 40 characters are not one after another. Is this possible? Any help will be greatly appreciated...
 
Your question is a bit vague. Suggest you read faq222-2244 to see how to clarify your question, and how the forum is supposed to work. You should especially read paragraphs 8,9 10, 14, 15 and 16.

What have you got so far?

Which bit are you stuck on?



___________________________________________________________
If you want the best response to a question, please check out FAQ222-2244 first.
'If we're supposed to work in Hex, why have we only got A fingers?'
Drive a Steam Roller
Steam Engine Prints
 
You might consider reading the text file into a recordsetn (use ADO). A recordset will give you a movable pointer to a given line of text.
 
BobRodes, thank for responding, i'll give it a try. Let you know how it works out.. Thank you.
 
It sounds like you want to get rid of all duplicates. The reason I say this is because if the dupes are not in order what reason would you have to delete the previous line unless I am missing something. If you are looking for a comparison tool for checking dupes look into the dictionary object and the .Exists method. However, as johnwm states, post what you have so we can get a better idea of your goal.

Swi
 
Hi Swi, i do want to delete all duplicates if they match in a text file (see below example) the first 20 characters. Will the dictionary object and the .Exists method work?

this what it looks like, in this example we are comparing the first 20 characters: ABC123 TOTAL TO ORDER:
and DEF456 TOTAL TO ORDER:

ABC123 TOTAL TO ORDER: 0 ;delete the first line
dfkjdslkfldkflkjlkjklkjlkklj
ABC123 TOTAL TO ORDER: 0
sdkfjksdflkdsjflkdjflkdjfkldj
DEF456 TOTAL TO ORDER: 0 ;delete the first line
sdkfjldkfjldkjkjkjasd
DEF456 TOTAL TO ORDER: 50
 
Here is a crazy idea

Create a disconnected recordset with ADO, having two fileds. The first field should hold the 20chars length value and the second field the rest you want.

Open the file, read the first line to the recordset. Until the end of file, read line, filter the recordset with the 20chars value of that line. If you have a record, update the 2nd 's field value, else add a new record.

When done with the file, move to the first record and open a new file to output results held in the recordset.

Simple?
 
It could be used. JerryKlmns also has an interesting idea.

Swi
 
I tried using dictionary object, works great but is deleting the second line, not the first line as i need it to. Here's the code i have so far:

Private Sub Command1_Click()
Dim Dict As Dictionary
Dim InputData As String
Dim Counter As Long
Dim UniqueCounter As Long

Open App.Path & "\MWGLBRPT.TXT" For Input As #1
Open App.Path & "\GLBRPT.TXT" For Output As #2


Set Dict = New Dictionary
Counter = 0
UniqueCounter = 0
'PreviousLine = InputData

Line Input #1, InputData
Dict.Add Left$(InputData, 20), UniqueCounter

' Loops through the file
Do
Line Input #1, InputData
Counter = Counter + 1
' Finds mismatches and adds to dictionary and output file if unique
If Dict.Exists(Left$(InputData, 20)) = False Then
UniqueCounter = UniqueCounter + 1
Print #2, InputData
Dict.Add Left$(InputData, 20), UniqueCounter
End If

Loop Until EOF(1)


Close
Set Dict = Nothing

End Sub
 
I am a little unclear on what you want to accomplish. Can you post an example of what the input would look like and what the output should look like? Thanks.

Swi
 
SWI, text file looks like this

12LDF { ON HAND= 1,947 } { QTY ALOC= 864 } { QTY AVAIL= 1,083 } { WH1 AVG= 2,573 }
12LDF { ON HAND= 992 } { QTY ALOC= 0 } { QTY AVAIL= 992 } { WH2 AVG= 402 }
12LDF { ON HAND= 1,599 } { QTY ALOC= 32 } { QTY AVAIL= 1,567 } { WH3 AVG= 839 }
12LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 1 SUGGESTED TO ORDER= 1699
12LDF TOTAL TO ORDER: 1699 'NEED TO DELETE THIS LINE
12LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 0 SUGGESTED TO ORDER= 11209
12LDF TOTAL TO ORDER: 12908

13LDF { ON HAND= 3,648 } { QTY ALOC= 640 } { QTY AVAIL= 3,008 } { WH1 AVG= 597 }
13LDF { ON HAND= 1,296 } { QTY ALOC= 0 } { QTY AVAIL= 1,296 } { WH2 AVG= 294 }
13LDF { ON HAND= 2,018 } { QTY ALOC= 40 } { QTY AVAIL= 1,978 } { WH3 AVG= 305 }
13LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 7 SUGGESTED TO ORDER= 0
13LDF TOTAL TO ORDER: 0 'NEED TO DELETE THIS LINE
13LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 5 SUGGESTED TO ORDER= 0
13LDF TOTAL TO ORDER: 0


WITH THE CODE I CURRENTLY HAVE I GET THE FOLLOWING RESULTS:

12LDF { ON HAND= 1,947 } { QTY ALOC= 864 } { QTY AVAIL= 1,083 } { WH1 AVG= 2,573 }
12LDF { ON HAND= 992 } { QTY ALOC= 0 } { QTY AVAIL= 992 } { WH2 AVG= 402 }
12LDF { ON HAND= 1,599 } { QTY ALOC= 32 } { QTY AVAIL= 1,567 } { WH3 AVG= 839 }
12LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 1 SUGGESTED TO ORDER= 1699
12LDF TOTAL TO ORDER: 1699 'NEED TO DELETE THIS LINE
12LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 0 SUGGESTED TO ORDER= 11209

13LDF { ON HAND= 3,648 } { QTY ALOC= 640 } { QTY AVAIL= 3,008 } { WH1 AVG= 597 }
13LDF { ON HAND= 1,296 } { QTY ALOC= 0 } { QTY AVAIL= 1,296 } { WH2 AVG= 294 }
13LDF { ON HAND= 2,018 } { QTY ALOC= 40 } { QTY AVAIL= 1,978 } { WH3 AVG= 305 }
13LDF ----> SHOP1 MNTHS SUPPLY AVAILABLE= 7 SUGGESTED TO ORDER= 0
13LDF TOTAL TO ORDER: 0 'NEED TO DELETE THIS LINE
13LDF ----> SHOP2 MNTHS SUPPLY AVAILABLE= 5 SUGGESTED TO ORDER= 0

I JUST NEED TO KEEP THE 2nd DUPLICATE AND THE DELETE THE 1st ONE. THANK YOU
 
So, you need to delete all lines with "TOTAL TO ORDER" in them, and also the first version of "13LDF" in this example, presumably because it's older data than the second one?

If so, you can leverage a few characteristics of your data. First, it would seem that each individual record has 6 lines of data. Pull everything into a recordset, putting the first six characters in a separate field. Add another field that numbers all the records in the order they came. Apply a filter that keeps only those records whose key has a count of greater than 6. Sort the result by the key descending, secondarily by the number field descending. Iterate through the recordset, deleting all records for a given key after the count of 6 has been reached. Then delete the "total to order" line.

HTH

Bob
 
Now that I understand your goal I would recommend that you try what BobRodes has described above.

Swi
 
Swi, i agree, i will give it shot tonite !!! thank you BobRodes, and everyone for your help !!!!!!!!!!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top