Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Eliminate Multiple Entries an an array 7

Status
Not open for further replies.
Dec 17, 2006
8
CA
I have an array as follows :

a(0) = "f"
a(1) = "a"
a(2) = "b"
a(3) = "c"
a(4) = "a"
a(5) = "b"
a(6) = "d" etc The size of the array is appr 70000

Many values are repeats ( a and b ) in the above sample.

How can I create an array which would contain only
f,a ( but only once ) b ( only once ) c and d

Thanks
 
Look into the dictionary object.

Swi
 
To answer your original question about creating (another) array with just the elements that you are after:
Code:
Dim a(0 To 6) As String
Dim b() As String
Dim dblA As Double
Dim dblB As Double
Dim blnIsFound As Boolean

a(0) = "f"
a(1) = "a"
a(2) = "b"
a(3) = "c"
a(4) = "a"
a(5) = "b"
a(6) = "d"

ReDim b(0)
b(0) = a(0)

For dblA = LBound(a) + 1 To UBound(a)
    blnIsFound = False
    For dblB = LBound(b) To UBound(b)
        If a(dblA) = b(dblB) Then
            'Element found
            blnIsFound = True
            Exit For
        End If
    Next dblB
    If Not blnIsFound Then
        ReDim Preserve b(UBound(b) + 1)
        b(UBound(b)) = a(dblA)
    End If
Next dblA

For dblB = LBound(b) To UBound(b)
    Debug.Print dblB & " " & b(dblB)
Next dblB

Debug window:
0 f
1 a
2 b
3 c
4 d

HTH

---- Andy
 
I would use collections for this purpose.
___
[tt]
Private Sub Form_Load()
Dim A(6) As String, L As Long, C As Collection
A(0) = "f"
A(1) = "a"
A(2) = "b"
A(3) = "c"
A(4) = "a"
A(5) = "b"
A(6) = "d"
Set C = New Collection
On Error Resume Next
For L = 0 To UBound(A)
C.Add A(L), A(L)
Next
On Error GoTo 0
For L = 1 To C.Count
Debug.Print C(L)
Next
Unload Me
End Sub[/tt]
 
70000 items? Like SWI I'd go for a Dictionary rather than a Collection
 
Where can I find this dictionary object ?

You need a Project Reference into the Microsoft Scripting Runtime. vb6 IDE Project menu - References - select Microsoft Scripting Runtime.
 
Some rough timings (using the Timer function) based upon my code in thread222-1311085 and still using the FlexGrids.

When there are 3,759 unique 3 character keys in a total of 4,000

Conventional looping 11.29 seconds
Dictionary 1.02 seconds
Collection .968 seconds

When there are 8480 unique 3 character keys in a total of 10,000

Conventional looping 58.62
Dictionary 1.91
Collection 1.96

When there are 26965 unique 3 character keys in a total of 70,000

Conventional looping; I gave up after waiting for several minutes >5 mins
Dictionary 6.2 seconds
Collection 6.34 seconds

In some runs Collection is faster but we're into decimals of a second and seriously into the limited resolution of the Timer function. In this test Collection and Dictionary seem effectively equal.

The Collection method requires no special Project Reference.

Hoping this of some interest, Hugh
 
>In this test Collection and Dictionary seem effectively equal.

This is because, as the sample size gets larger, the limiting factor becomes the speed of the flexgrid. Dictionaries are a lot faster and more efficient than collections
 
I'm using using dynamic string arrays in place of the FlexGrids for the timing test now;

27003 unique 3 character keys in 70,000

Conventional looping 76.48 seconds
Dictionary 0.375 seconds
Collection 0.531 seconds

Multiple runs confirms using Dictionary is consistently faster than using Collection. Both are extremely quick when compared to conventional looping.

regards Hugh
 
>Multiple runs confirms using Dictionary is consistently faster than using Collection.

I also thought the same about dictionaries in general but doing some further experiment I found that this is not always consistent.

If there are too many duplicates in the source array, Dictionary is much faster than Collection. But as the number of duplicates decreases (meaning more unique items), Collection outperforms the Dictionary object.

3 character keys in an array of 70,000 items mean too many duplicates (less than 26[sup]3[/sup] = 17576 unique items if only alphabets are used). In this case Dictionary is faster than Collection. But if you increase the key length to 4 or 5 characters, than there are very few duplicates and Collection performs faster than the dictionary.

This can be seen from the following test.
___
[tt]
Option Explicit
Dim A(100000) As String
Private Sub Form_Load()
AutoRedraw = True
Randomize
Dim N As Long
For N = 1 To 5
InitArray N
ColMethod
DicMethod
Next
End Sub
Sub InitArray(NumChars As Long)
Dim L As Long, N As Long
Erase A
For L = 0 To UBound(A)
For N = 1 To NumChars
A(L) = A(L) & Chr$(vbKeyA + Int(Rnd * 26))
Next
Next
Print "Key Length: " & NumChars
End Sub
Sub ColMethod()
Dim S As Single, L As Long, C As Collection
S = Timer
Set C = New Collection
On Error Resume Next
For L = 1 To UBound(A)
C.Add A(L), A(L)
Next
On Error GoTo 0
Print "Collection", C.Count; "items", Timer - S; "seconds"
End Sub
Sub DicMethod()
Dim S As Single, L As Long
S = Timer
Dim D As Dictionary
Set D = New Dictionary
On Error Resume Next
For L = 1 To UBound(A)
D.Add A(L), A(L)
Next
On Error GoTo 0
Print "Dictionary", D.Count; "items", Timer - S; "seconds"
End Sub[/tt]
___

The results (apparently) indicate that for large number of items, collection performs faster than the dictionary object. However, dictionary object is very fast at searching and rejecting duplicate items already present in the dictionary.

To confirm this I did another test. Adding new, unique items, and adding duplicate items, both to collection and dictionary. See the code below.
___
[tt]
Option Explicit
Const NumItems As Long = 150000
Private Sub Form_Load()
AutoRedraw = True
Dim N As Long, S As Single
Dim C As Collection, D As Dictionary
Print "Adding unique items..."
S = Timer
Set C = New Collection
For N = 1 To NumItems
C.Add N, CStr(N)
Next
Print "Collection:"; Timer - S
S = Timer
Set D = New Dictionary
For N = 1 To NumItems
D.Add CStr(N), N
Next
Print "Dictionary:"; Timer - S
'duplicate test
Print "Adding duplicate items..."
On Error Resume Next
S = Timer
For N = 1 To NumItems
D.Add CStr(N), N
Next
Print "Collection:"; Timer - S
S = Timer
For N = 1 To NumItems
C.Add N, CStr(N)
Next
Print "Dictionary:"; Timer - S
On Error GoTo 0
End Sub[/tt]
___

The results on my PC confirmed that for fewer items (less than 100,000), dictionary was faster in all respects.

However, as the number of items was increased (above 100,000), collection proved to be faster at adding new, unique items. Even then, it could not compete the dictionary object at searching and rejecting duplicate items. The performance of dictionary object in this specific area was exceptional.
 
Good stuff, Hypetia. Have another star. :)

Great thread overall, too.
 
Hypetia,

Yes. My last test code untouched (which accomodated varying key length) gives;

199978 unique 6 character keys in 200,000
Dictionary 7.74 seconds
Collection 2.24 seconds

199332 unique 5 character keys in 200,000
Dictionary 7.50 seconds
Collection 2.11 seconds

179708 unique 4 character keys in 200,000
Dictionary 7.39 seconds
Collection 3.64 seconds

27952 unique 3 character keys in 200,000
Dictionary 0.88 seconds
Collection 0.97 seconds

961 unique 2 character keys in 200,000
Dictionary 0.30 seconds
Collection 0.69 seconds

All in the IDE as before.

My test code similar to yours except that 1) I used 'If dict.Exists(a) = False' in place of your error trap in the Dictionary test and 2) a second array containing the unique values was built on the fly using Redim Preserve.

regards Hugh


 
Hypetia, you are not quite comparing like for like. Take the CStr conversion out of the Dictionary .Add lines (Dictionary does its own casting of the Key, so by using CStr you are actually causing two conversions to take place, whilst the Collection is only doing 1), and examine the results again
 
>Dictionary does its own casting of the Key
I don't know how dictionary treats the internal data. I just tried to use similar code, both with dictionary and collection. And also because of the reason that, the 'key' is typically a String in most practical situations, rather than a Long.

But now...

I am completely baffled as I find a big mistake in my second test code. In the 'duplicate test' part, the timings of collection and dictionary are interchanged by mistake (see C.Add and D.Add at wrong places). I fixed the error and also removed CStr from Dictionary part, and tested the following code again.
___
[tt]
Option Explicit
Const NumItems As Long = 50000 '100000'500000
Private Sub Form_Load()
Dim N As Long, S As Single
Dim C As Collection, D As Dictionary
Debug.Print "Adding unique items..."
S = Timer
Set C = New Collection
For N = 1 To NumItems
C.Add N, CStr(N)
Next
Debug.Print "Collection:"; Timer - S
S = Timer
Set D = New Dictionary
For N = 1 To NumItems
D.Add N, N
Next
Debug.Print "Dictionary:"; Timer - S

'duplicate test
Debug.Print "Adding duplicate items..."
On Error Resume Next
S = Timer
For N = 1 To NumItems
C.Add N, CStr(N)
Next
Debug.Print "Collection:"; Timer - S
S = Timer
For N = 1 To NumItems
D.Add N, N
Next
Debug.Print "Dictionary:"; Timer - S
On Error GoTo 0
End Sub[/tt]
___

The results were quite surprising:

NumItems = 50,000
[tt]Adding unique items...
Collection: 0.46875
Dictionary: 0.125
Adding duplicate items...
Collection: 0.203125
Dictionary: 0.15625
[/tt]
NumItems = 100,000
[tt]Adding unique items...
Collection: 0.984375
Dictionary: 0.515625
Adding duplicate items...
Collection: 0.421875
Dictionary: 0.75
[/tt]
NumItems = 500,000
[tt]Adding unique items...
Collection: 5.734375
Dictionary: 13.21875
Adding duplicate items...
Collection: 2.421875
Dictionary: 18.04688
[/tt]
With 50,000 items, dictionary was faster than collection.
With 100,000 items, dictionary was better at adding new items, whereas collection was better at duplicates (contrary to what I saw earlier with wrong code).
With 500,000 items, collection outperformed dictionary in both areas.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top