Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Andrzejek on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

List duplicate line in a text file

Status
Not open for further replies.

geofflsc

Technical User
Dec 19, 2012
9
CA
Hi all,

I have a text file that list some names. I am looking for a script to check and show only for the duplicate lines. I have been searching all over the sites and couldn't find anything related to it. Any help would be appreciated.

Here is an example of the text file. Duplicates in red

Code:
\20016_6010_00_XCO_PAN-INSERT
[COLOR=#EF2929]\123-1118[/color]
\123-6001
\200191010_00_10100_XCO_COVER-MAIN-INSERT
\20019201_01_2010_XCO_EJECTOR-IMPRESSION_SO_V
\20019202_00_2020_XCO_EJECTOR-MAIN-INSERT
\20019203_00_2030_XCO_EJECTOR-INSERT_SO_V
\20019203_00_2030_XCO_EJECTOR-INSERT_SO_V
\20019210_00_2100_XCO_EJECTOR-SHOT-BISCUIT
\20019223_00_2230_XCO_EJPINBACKPLTCTR_SO_V
\20019270_00_2700_XCO_EJECTION-PLATE
\20019270_00_2700_XCO_EJECTION-PLATE_SO_V
[COLOR=#EF2929]\123-1118[/color]
\175-4503-BETA
\175-6503-BETA
[COLOR=#EF2929]\175-1001
\175-1001[/color]
\175-1118
\175-3001
\175-4503
\175-5001
\175-6503
\182-4001
\182-3001
\182-6001
\182-4001-BETA
 
Hmmm... How familiar are you with programming in general?
Read the entire text file and set it aside (strContent). Close the text file and open it again to reset the stream pointer. Start from the beginning of the text file and read each line. See if it exists in strContent more than once. If so, store it as a new item in a dictionary if it doesn't already exist. After you've read each line of the file, write the dictionary items to a file

-Geates

 
Better yet (IMO)...

Read the entire text file into a variable (strContent). Split strContent into arrLines. Loop through arrLines to see if it occurs again in the remaing lines.If so, store it as a new item in a dictionary if it doesn't already exist. After you've iterated arrLines, write the dictionary items to a file.

-Geates

 
I agree with Geates that a dictionary object would work nicely.

Another approach would be, since the dictionary object lets you create key / item pairs (the key could be each unique path, and the item could be a count of how many times that path was found), and since you can easily see if a Key was already added, the pseudocode below would work:

Code:
For each "path" line in the text file
   If the key for that path does not exist
      Add that Key, and set its Item value to 1
   Else (if the key already exists)
      Increment the item value for that key
   End If
Next

Translated quickly, the code would resemble:
Code:
Set dict = CreateObject("Scripting.Dictionary")

For Each sPath In arr
   If dict.Exists(sPath) Then
      dict.Item(sPath) = dict.Item(sPath) + 1
   Else
      dict.Add sPath, 1
   End If
Next
 
Thanks Geates and guitarzan, I have a better idea now.
 
Read the entire text file into a variable (strContent). Split strContent into arrLines. Loop through arrLines to see if it occurs again in the remaing lines. If so, store it as a new key/item entry in a dictionary if it doesn't already exist. After you've iterated arrLines, write the dictionary entry pairs to a file.

Code:
set objFSO = CreateObject("Scripting.FileSystemObject")
set objDuplicates = CreateObject("Scripting.Dictionary")

[COLOR=#EF2929]set objStream = objFSO.OpenTextFile("C:\input.txt", 1)
strContent = objStream.ReadAll[/color]
[COLOR=#E9B96E]arrLines = split(strContent, vbNewLine)[/color]

[COLOR=#AD7FA8]for i = 0 to ubound(arrLines)
	strLine = arrLines(i)[/color]
	intOccurrences = 1
	[COLOR=#4E9A06][b]if not (objDuplicates.Exists(strLine)) then [/b][/color]
		[COLOR=#AD7FA8][b]for j = (i + 1) to ubound(arrLines)
			if (strLine = arrLines(j)) then intOccurrences = intOccurrences + 1
		next[/b][/color]
		[COLOR=#73D216]if (intOccurrences > 1) then objDuplicates.Add strLine, intOccurrences[/color]
	[COLOR=#4E9A06][b]end if[/b][/color]
[COLOR=#AD7FA8]next[/color]

[COLOR=#204A87][b]set objStream = objFSO.OpenTextFile("C:\duplicates.txt", 2, true, 0)[/b][/color]
[COLOR=#3465A4]arrKeys = objDuplicates.Keys
for x = 0 to ubound(arrKeys)[/color]
	[COLOR=#204A87][b]objStream.WriteLine arrKeys(x) & ": " & objDuplicates.Item(arrKeys(x))[/b][/color]
[COLOR=#3465A4]next[/color]

objStream.close
msgbox "done"

-Geates

 
Code:
\20016_6010_00_XCO_PAN-INSERT
[COLOR=#EF2929]\123-1118[/color]
\123-6001
\200191010_00_10100_XCO_COVER-MAIN-INSERT
\20019201_01_2010_XCO_EJECTOR-IMPRESSION_SO_V
\20019202_00_2020_XCO_EJECTOR-MAIN-INSERT
[COLOR=#73D216]\20019203_00_2030_XCO_EJECTOR-INSERT_SO_V[/color]
[COLOR=#73D216]\20019203_00_2030_XCO_EJECTOR-INSERT_SO_V[/color]
\20019210_00_2100_XCO_EJECTOR-SHOT-BISCUIT
\20019223_00_2230_XCO_EJPINBACKPLTCTR_SO_V
\20019270_00_2700_XCO_EJECTION-PLATE
\20019270_00_2700_XCO_EJECTION-PLATE_SO_V
[COLOR=#EF2929]\123-1118[/color]
\175-4503-BETA
\175-6503-BETA
[COLOR=#EF2929]\175-1001[/color]
[COLOR=#EF2929]\175-1001[/color]
\175-1118
\175-3001
\175-4503
\175-5001
\175-6503
\182-4001
\182-3001
\182-6001
\182-4001-BETA

Would the values in green also be considered a match, or are you looking for only numbers with no text following?
 
jges,

Yes, I missed those 2 lines.

Geates's script is working great for what I need. Thanks.
 
Hi Geates,

Now I have another complex trouble. I have a list of file and need to show only the partially duplicate(in green) but the REV or ECR are different(in red). Is it possible to do it? Thanks.

Code:
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-0504_REV004_ECR3683.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-0505_REV004_ECR3684.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-0506_REV001_ECR3685.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-1100_REV004_ECR3923.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-2100_REV004_ECR3924.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-3000_REV001_ECR3725.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-3100_REV006_ECR3925.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-4000_REV001_ECR3712.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-4100_REV004_ECR3927.RTM
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]0000.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]3227.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]3325.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]002_[COLOR=#4E9A06]ECR[/color]3431.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]002_[COLOR=#4E9A06]ECR[/color]3498.RTM[/color]
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1002_REV001_ECR3270.RTM
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]0000.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]3226.RTM[/color]
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV002_ECR3429.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2033_REV002_ECR3430.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_3001_REV002_ECR3428.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_3070_REV002_ECR3427.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_4002_REV001_ECR0000.RTM

Code:
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]0000.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]3227.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]3325.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]002_[COLOR=#4E9A06]ECR[/color]3431.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV[/color][COLOR=#EF2929]002_[COLOR=#4E9A06]ECR[/color]3498.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]0000.RTM[/color]
[COLOR=#4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV[/color][COLOR=#EF2929]001_[COLOR=#4E9A06]ECR[/color]3226.RTM[/color]
 
Yes, it is possible. In fact, it's just a few line of alternate code. The best approach would be to use Regular Expressions. Unfortunately, I am not well experienced with RegEx. Perhaps, someone will chime in with a possible solution. In the meantime, google "vbs regex tutorial" to get a pre-understanding.

-Geates

 
While regex is certainly an option, there may be an easier solution here. From the example data given, it appears that you could split your string into 2 parts: 1)everything before REV and 2)everything after REV. Then you could base your comparison on the first part of the string and if there is a match, add the original string into your collection.

Of course this depends on how well you know the data and how consistent it is.
 
Hi all,

I need help with my script. I am kind of stuck here now. My script is simple and it did the work. The Original.txt shows all the files. The Error.txt supposed to generate all the repeat files, but my script can only show the repeated files(without the highlighted one in [highlight #4E9A06]green[/highlight]) Any help would be greatly appreciated.

Code:
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-2100_REV004_ECR3924.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-3000_REV001_ECR3725.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-3100_REV006_ECR3925.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-4000_REV001_ECR3712.RTM
K:\Nemak\BC430A-EX\All Shop Orders\RTM\BC430A-EX-4100_REV004_ECR3927.RTM
[COLOR=#EF2929]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV001_ECR0000.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV001_ECR3227.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV001_ECR3325.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV002_ECR3431.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV002_ECR3498.RTM[/color]
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1002_REV001_ECR3270.RTM
[COLOR=#EF2929]K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV001_ECR0000.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV001_ECR3226.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV002_ECR3429.RTM[/color]
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2033_REV002_ECR3430.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_3001_REV002_ECR3428.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_3070_REV002_ECR3427.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_4002_REV001_ECR0000.RTM

Code:
[highlight #4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV001_ECR0000.RTM <<< Missing[/highlight]
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV001_ECR3227.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV001_ECR3325.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV002_ECR3431.RTM
K:\Nemak\HT2\All Shop Orders\RTM\HT2_1001_REV002_ECR3498.RTM
[highlight #4E9A06]K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV001_ECR0000.RTM <<< Missing[/highlight]
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV001_ECR3226.RTM 
K:\Nemak\HT2\All Shop Orders\RTM\HT2_2001_REV002_ECR3429.RTM

Code:
datafile="c:\Original.txt"
outputfile="c:\Error.txt"

set fso=createobject("scripting.filesystemobject")
if not fso.fileexists(datafile) then
  set fso=nothing
  wscript.quit 99
end if

set ots=fso.opentextfile(datafile,1)
set outs=fso.createtextfile(outputfile)
set odic1=createobject("scripting.dictionary")
do while not ots.atendofstream
	s=ots.readline
	skey=mid(s,1,instr(s,"_REV"))
	if not odic1.exists(skey) then
		odic1.add skey, s
	else
		found=odic1.item(skey)
		outs.writeline found
	end if
loop
ots.close
outs.close
set ots=nothing
set outs=nothing
set fso=nothing
 
A the key is added to the dictionary if it doesn't exist, ELSE, print the current line (in this case, the duplicate) to the error file.

Code:
do while not ots.atendofstream
	s=ots.readline
	skey=mid(s,1,instr(s,"_REV"))
	if not odic1.exists(skey) then
		odic1.add skey, s
	[b]else
		found=odic1.item(skey)
		outs.writeline found[/b]
	end if
loop

Implement the same duplication discovery method as I did above.

-Geates

 
when this list is subject to you script, "app" will never be written to the error file. It is either added to the dictionary (and never heard from again) OR it is written to the error file. The error file will contain all the word words.

Code:
app
appalachian
apparatus
apparent
appendix
apple

-Geates

 
Thanks Geates. I finally made it work perfectly after taking your opinions.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top