Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Deduplicate a text file 1

Status
Not open for further replies.

360degreehosting

IS-IT--Management
Oct 17, 2006
16
US
Hello,

I have some text files i need to deduplicate.

I have started with the following code but I'm getting an error and don't know where to go from here.

File 1 text file contents:

steve@123.com
steve@234.com
steve@123.com

I'm wanting to check if the string already exists in the File 2 text file yet and only if it is not should it write the string from File 1 text file.

If the script runs correctly File 2 should contain:

steve@123.com
steve@234.com

If someone has a better suggestion of how to do this I am interested in learning a better way.

Thank you for your help....

Code:
varFile1 = "C:\file1.txt" 'see above for File 1 contents
varFile2 = "C:\file2.txt"

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set fle1 = objFSO.OpenTextFile(varFile1,1)

Set fle2 = objFSO.OpenTextFile(varFile2,2)

Do While Not fle1.AtEndofStream 'Change this line, Change this one too

	strLine = fle1.ReadLine

	If InStr(StrLine, "@") > 0 Then
		Response.Write StrLine & "<br>"		
		If InStr(fle2.Readall, StrLine) = 0 then
			fle2.WriteLine StrLine
		End If
	End If

Loop

fle1.close
Set fle1 = nothing
fle2.close
Set fle2 = nothing
Set objFSO = nothing
 
I see quite a few problems in the conception of the script. Such as this. fle2 cannot at the same time read and write. If you do it read-write-close each time for a new data from fle1, it is O(n^2) operation and it involves many disk operation which would slow down thing.

I would propose an alternative for your perusal using dictionary object.
[tt]
varFile1 = "C:\file1.txt" 'see above for File 1 contents
varFile2 = "C:\file2.txt"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set fle1 = objFSO.OpenTextFile(varFile1,1[blue],true[/blue])

[green]dim odic,i,data,key
set odic=createobject("scripting.dictionary")
i=0
do while not fle1.atendofstream
data=f.readline
if instr(data,"@")<>0 then
if not odic.exists(trim(data)) then
odic.add trim(data),i
end if
end if
loop
set odic=nothing[/green]
fle1.close
Set fle1 = nothing

Set fle2 = objFSO.OpenTextFile(varFile2,2[blue],true[/blue])
[green]for each key in odic.keys
fle2.writeline key
next[/green]
fle2.close
Set fle2 = nothing
Set objFSO = noting
[/tt]
 
Thanks! Upon re-read what I post, I put the set odic=nothing at the wrong place. Should make that correction if you use the idea.
[tt]
'etc
[red]'[/red]set odic=nothing
fle1.close
'etc etc
next
[green]set odic=nothing 'the right place[/green]
fle2.close
'etc etc
[/tt]
 
Yes, thank you. I did move it once I got an error.

I am going to be using this solution. I appreciate you taking the time to suggest it.

Warmest Regards,
Steve
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top