Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

VBScript Duplicate Elimination 1

Status
Not open for further replies.

Yaik

Programmer
Oct 1, 2010
36
US
So I need to eliminate duplicates in my file, here is the content of one of my files


Inventory Date,Product Name,Version,Install Date,Status,Host Name
10/8/2010,Crystal Reports for Visual Studio,12.51.0.240,20100428,Installed,DAVID-PC
10/8/2010,Microsoft Office Access MUI (English) 2007,12.0.6425.1000,20100429,Installed,DAVID-PC
10/8/2010,Update for Microsoft Office Access 2007 Help (KB963663),,,Installed,DAVID-PC
10/8/2010,Security Update for Microsoft Office Access 2007 (KB979440),,,Installed,DAVID-PC
10/8/2010,Microsoft Office Access Setup Metadata MUI (English) 2007,12.0.6425.1000,20100423,Installed,DAVID-PC
10/8/2010,Adobe Acrobat 9 Pro Extended 64-bit Add-On,9.0.0,20100521,Installed,DAVID-PC
10/8/2010,"Adobe Acrobat 9 Pro Extended - English, Français, Deutsch",9.0.0,20100521,Installed,DAVID-PC
10/8/2010,"Adobe Acrobat 9 Pro Extended - English, Français, Deutsch",9.0.0,5/21/2010,Installed,DAVID-PC
10/8/2010,Fireworks Pack v1.0 for Pocket Tanks Deluxe,1,,Installed,DAVID-PC
10/8/2010,Adobe Reader 9.3.2,9.3.2,20100425,Installed,DAVID-PC
10/8/2010,AnyReader,3,20101002,Installed,DAVID-PC

This is the content of one of my csv files. In this case there is a duplicate, which is Adobe Acrobat 9 Pro. The thing is that both lines are not exactly the same so I can't do an exact comparison.

What I had in mind is split the lines at the "," and then compare it depending on the Product name. I was able to do this, but then I don't know how to write the whole line again instead of the word I was comparing it to.

Here is what I have so far.
Code:
Set objFSO = CreateObject("Scripting.FileSystemObject")
strFile="I:\0-STUFF\Scripts\Newer\WORKING\New\PCs\DAVID-PC Software Info.csv"
Set objFile = objFSO.OpenTextFile(strFile)
Set dicSort = CreateObject("Scripting.Dictionary")

Do While Not objFile.AtEndOfStream
    On Error Resume Next
    strData = objFile.ReadLine
    tempstrData = strData
    MyArray = Split(strData, ",", -1, 1)
    strData = MyArray(1)
    dicSort.Add strData, dicSort.Count
Loop

objFile.Close
Set objFile = objFSO.CreateTextFile(strFile)
For Each Item In dicSort
  objFile.WriteLine Item
Next
objFile.Close

WScript.Echo "Done"

Is there any way to modify this script so that it outputs the whole line again instead of the word. If not, can anyone point me in the right direction
 
[0] >how to write the whole line again instead of the word I was comparing it to
This can be done like this.
[0.1][tt]
[red]'[/red]dicSort.Add strData, dicSort.Count
dicSort.Add strData, tempstrData[/tt]
[0.2][tt]
[red]'[/red]objFile.WriteLine Item
objFile.WriteLine dicSort(Item)[/tt]

[1] Now what are the problems of the script as I see it.

[1.1] Since you output to the same file name, you are necessarily overwriting the input in order to have anything meaningful done, you present script will always in error.
[tt]
[red]'[/red]Set objFile = objFSO.CreateTextFile(strFile)
Set objFile = objFSO.CreateTextFile(strFile[red],true[/red])
[/tt]
[1.2] You should not use the on error resume next, instead control the .add method. Here, it is not only the theoretical consideration, no. It gives you the necessary control of which line you want to add (the first or the last occurence of the same software... which is lacking as such now.)
[tt]
Do While Not objFile.AtEndOfStream
[red]'[/red]On Error Resume Next
strData = objFile.ReadLine
tempstrData = strData
MyArray = Split(strData, ",", -1, 1)
strData = MyArray(1)
[blue]if dicSort.exists(strData) then
'do nothing, then the first occurence prevails
'or this, the last occurence prevails
dicSort(strData)=tempstrData
else
dicSort.Add strData, tempstrData
end if[/blue]
Loop
[/tt]
[1.3] Since the language involved appeals to the character outside of 0x00-7f range, you have to be careful. fso will have a hard time handling them correctly when writing the output and reading as well at times. You should "prepare" the input csv in unicode (utf-16). Then you read inputs and write outputs in unicode as well as a means to communication.
[tt]
'etc etc ...
Set objFile = objFSO.OpenTextFile(strFile[blue],1,false,-1[/blue]) 'input file is in unicode
'etc etc ...
Set objFile = objFSO.CreateTextFile(strFile[blue],true,-1[/blue])
'etc etc...
[/tt]
 
Thank you very much for all your help. This ended up working
Code:
Set objFSO = CreateObject("Scripting.FileSystemObject")
strFile="I:\0-STUFF\Scripts\Newer\WORKING\New\PCs\DAVID-PC Software Info.csv"
Set objFile = objFSO.OpenTextFile(strFile)
Set dicSort = CreateObject("Scripting.Dictionary")

Do While Not objFile.AtEndOfStream
    On Error Resume Next
    strData = objFile.ReadLine
    tempstrData = strData
    MyArray = Split(strData, ",", -1, 1)
    strData = MyArray(1)
    if dicSort.exists(strData) then
        dicSort(strData)=tempstrData
    else
        dicSort.Add strData, tempstrData
    end if
    dicSort.Add strData, tempstrData
Loop

objFile.Close
Set objFile = objFSO.CreateTextFile(strFile,true)
For Each Item In dicSort
  objFile.WriteLine dicSort(Item)
Next
objFile.Close

Wscript.Echo "Done"

I couldn't do the last two things you told me two because it output the content in one line and in Japanese(I know, is weird).

Then I also couldn't get the code to work so I added the On Error Resume Next and that seemed to make it work because it was giving me this error otherwise "This key is already associated with an element of this collection. This error is referring to the following line "discSort.Add strData,tempstrData".

Thank you for all your help, it would'v taken me for ever to figure this out since I just recently started coding with VBScript.
 
[1.2.1] This is do loop that I meant (no repetition of .Add line).
[tt]
Do While Not objFile.AtEndOfStream
strData = objFile.ReadLine
tempstrData = strData
MyArray = Split(strData, ",", -1, 1)
strData = MyArray(1)
if dicSort.exists(strData) then
dicSort(strData)=tempstrData
else
dicSort.Add strData, tempstrData
end if
Loop
[/tt]
 
Oh alright, my bad. It works that way too, but I still don't get the whole point of having these

Set objFile = objFSO.OpenTextFile(strFile,1,false,-1)
Set objFile = objFSO.CreateTextFile(strFile,true,-1)

When I put those like that, it gives me the error "Subscript out of range: '[number:1]'" which is referring to the line that has this "strData = MyArray(1)
 
[2] There must have "physical" encoding change before using those lines. The csv should then be encoded in utf-16 (with or without bom, le or be). Without that preparation, there is no point in adopting the approaching.

[2.1] >Set objFile = objFSO.OpenTextFile(strFile,1,false,-1)
[ul][li]strFile : the path to the file[/li]
[li]1 : for reading operation[/li]
[li]false : if the file is not found, it won't generate one for you, it will error out, there is no point in generating an empty file data file for you[/li]
[li]-1 : encoding of the text file be unicode[/li][/ul]
That's the meaning of the arguments.

[2.2] >Set objFile = objFSO.CreateTextFile(strFile,true,-1)
[ul][li]strFile : the path to the file (Note: it is the same here as the input file.)[/li]
[li]true : it will overwrite an existing file of the same name. If it is false, it won't create for you, it will error out.[/li]
[li]-1 : same as above (text file encoding.)[/li][/ul]

[3] You've to download the wsh5.6 documentation (chm) or seach msdn online document. (You can get by for a couple of weeks-it is an easy language, and then no... without documentation.)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top