Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Arrays, Collections, insertion, and speed 1

Status
Not open for further replies.

frothytoad

Technical User
Aug 1, 2003
32
0
0
US
Hello. I could use some advice from people who have faced this problem already.

For an application I am building I need an autocomplete combo box. Naively, I went and coded one, and it worked fine, but it used a linear search. When I realized that I might have many tens or hundreds of thousands of entries (or more), I went back and put in a binary search. No problem. But...

In order to add and remove new items efficiently, I use a collection to store the items in the list. But, as you may have already guessed, this has some drawbacks, which leads me to my two questions:

1) Is there a way to insert a new element into the middle of an array? I.e. if I have an array with 5000 entries, can I put one in between 2033 and 2034 without redimming, looping from 5000 to 2034, shifting all values, and then adding the new value?

2) Is there a fast way to add large numbers of things to a collection? It takes about 8-10 seconds to add ~6000 items, which is too long.

I had thought about just referencing into the recordset that provides the data and not storing the values anywhere else, but I cannot be sure that my recordset is sorted, and it does not seem that ADO provides a sort method (or did I just miss it???). Also, the needs for displaying info in the combo box make an unsort data source unacceptable.

Lots of questions, I realize, but even a few answers would be appreciated.

Thanks!

-- Jeff
 
Maybe instead of just putting a string in your array you could use a UDT to create a "node type" that holds both the string value and the array index of the parent node.

 
ADO recordsets do have a Sort property that you specify as
Code:
rs.Sort = "fld1 ASC, fld2 DESC, etc."
rs.Refresh   [COLOR=green]' Causes the Sort to be performed[/color]


Strongm has several posts in this forum extolling the virtues of a Dictionary Object rather than a collection for both flexibility and speed reasons.
 

Thanks for the responses. A question for each:

Sheco - The only problem I see is that it seems that I would no longer have an index to permit a binary search. Can you think of another way to quickly zoom in on an entry of interest with such a UDT?

Golom - It seems that I would have a similar problem with a dictionary object as with a UDT. While it does give the capability to find quickly whether a word existed, it does not seem that there is a way to ask, "which dictionary items begin with the string strMyString", or to focus in on words that before and after the entry of interest. My question for you, though, is this: If I take an input recordset and clone it, is it safe to think that sorting the clone will not affect the original?

Thanks again...

-- Jeff
 
>Is there a way to insert a new element into the middle of an array?

As far as inserting a new item in the middle of an array is concerned, it is slow using the conventional For-Next approach.

Not only the looping slows down the code but reassigning strings to following items causes existing strings to be destroyed and new strings to be created in new memory locations. This destruction/construction of strings slows down the loop drastically when you are dealing with large arrays.

An alternate to this approach is to treat your string array as an array of pointers to strings, and only insert/manipulate these pointers instead of touching actual strings. This method does not destroy or create any string which is the biggest cause of performance hit.

Furthermore, instead of moving string pointers using For-Next loop, you can move them in memory using CopyMemory function as they reside in contiguous memory locations.

This further improves the performance because whole For-Next loop assingment is done in a single call to CopyMemory function call which is extremely fast.

See the following test code which demonstrate the idea.
___
[tt]
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As Long)
Dim A(100) As String
Dim ArraySize As Long 'current size of the array
Private Sub Form_Load()
InitArray
ShowArray
InsertItem "New item at position 4", 4
ShowArray
InsertItem "Another item at position 9", 9
ShowArray
End
End Sub

Sub InitArray()
'initialize array with sample data
Dim N As Long
For N = 1 To 12
A(N) = MonthName(N)
Next
'set the initial number of items in the array
ArraySize = 12
End Sub

Sub ShowArray()
'show the contents of the array in a MsgBox
Dim N As Long, S As String
For N = 1 To ArraySize
S = S & N & vbTab & A(N) & vbNewLine
Next
MsgBox S
End Sub

Sub InsertItem(Item As String, Index As Long)
'increase the size of array
ArraySize = ArraySize + 1
'assign the new item to the end of the array
A(ArraySize) = Item
'save the address of the string pointed by the last item
Dim L As Long
L = StrPtr(A(ArraySize))
'shift down the contents of array starting from location [index]
CopyMemory ByVal VarPtr(A(Index + 1)), _
ByVal VarPtr(A(Index)), (ArraySize - Index) * 4
'make the item at location [index] point to the
'new string whose address was saved earlier in L
CopyMemory ByVal VarPtr(A(Index)), L, 4
End Sub[/tt]
___

I tested the above code with an array of 5 million items and it took only 20 ms to insert an item at the top of the list.
 
Hypetia,

I'm not familiar with the CopyMemory call, but I was wondering if your InsertItem subroutine could be simplified slightly to this:

Code:
Sub InsertItem(Item As String, Index As Long)
    'increase the size of array
    ArraySize = ArraySize + 1
    'shift down the contents of array starting from location [index]
    CopyMemory ByVal VarPtr(A(Index + 1)), _
    ByVal VarPtr(A(Index)), (ArraySize - Index) * 4

    'assign the new item into the array
    A(Index) = Item

End Sub
 
This code looks simple and ok but it is not that simple.

The reason for this that when we move pointers starting from [index] to [index + 1], the item at location [index + 1] starts pointing to the same string of location [index].

Since this string is now pointed by two items (viz. index and index+1) direct assignment to A(index) causes that string to be destroyed, since it was originally associated with A(index). But since A(index + 1) was also pointing to the same string, it now starts pointing to an invalid data, and even may cause your application to crash when you access or write to A(index) subsequently.

Consider the first call to the InsertItem function when new item is inserted at position 4.

Originally the array looked like this.

January, February, March, April, May...

When the CopyMemory function is executed in InsertItem function, the array starts looking like this.

January, February, March, April, April, May...

The April in blue color being the place for inserting new item.

If we assign the new item directly to this place (A(index) = Item), the string 'April' will be destroyed in the memory (because of new assignment) and array will look like this.

January, February, March, New item at position 4, ???, May...

Note that the item at position 5 is now pointing to invalid data. The reason for being that both item 4 and 5 were pointing to the same string 'April', which is now destroyed by the assignment to item 4.

You can check that this behaviour causes the program to access invalid memory locations, which displays wrong data and may also cause VB to crash due to access violation.

To avoid this problem, we assign the new item to the end of the array so that it allocated in the memory location reserved by the array. After that we copy its address (stored in L) to location A(index) using CopyMemory function. This only updates the pointer of A(index) which now points to the correct data reserved by the array.

Hope it makes some sense.
 

Excellent solution, Hypetia! I never knew about CopyMemory.

Let me ask a follow-on. Since my control will not know the number of array elements until each is added (except when I load them from a recordset, when at least I know how many will be added as a group), is there any reason this will not work with a redimensionable array?

And while we are at it, is there a similar fast, backdoor approach to ReDim Preserve so that it does not need to grab whole new contiguous memory blocks for creating new arrays from the old ones? (I am guessing that ReDim reserves a new block of memory for the new array size, copies over the pointers, and then frees up the old space. Is this correct?) The reason I ask is that if I have a million things, and then want to add another, it seems a shame to have to move a million pointers instead of just adding one more.

-- Jeff


 
Items in an array always occupy contiguous memory locations. So you have to bear the limitations of ReDim Preserve. ReDimmming a large array again and again will certainly slow down the program. There is no way to insert (or even append) a new item in an existing array whose address does not coinside with the existing items in the array.

To avoid this, I suggest you to use fixed length arrays with enough spare capacity. Or estimate the size of your array in advance, if it is possible.

For example, if you are just loading a recordset in the array, then allocate the array in such a way that it has room for all items in the recordset.

If you use ReDim, then it is advisable not to ReDim your array for each new item inserted. Instead, increase the size of your array using ReDim Preserve in larger chunks, say 1000 items at one time. This will avoid further calls to ReDim Preserve for next 999 items and will still fulfil your needs to handle dynamic arrays in this manner.

Using the same pointer manipulation, you can also write a routine to remove items from the array. In this method, you will be shifting the pointers up in the array instead of pushing them down which we did above.

Based on similar methods, you can write standalone routines for sorting arrays which are extremely fast.
 
You can find more stuff on inserting into arrays using copymem at: and
You might also be interested in "Check if a Value exists in an array without looping" at: This basically converts the entire array to a string and then uses Instr to look for your search item. I have to confess I've not tried it, and given that string handling can be notoriously slow, I'm not sure it would necessarily improve your speed, but it might be worth trying. It certainly looks like a cute approach.

Hope this helps,
Tony
 
You can also search a combo or listbbox like this:

Code:
Option Explicit

Private Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
                (ByVal hwnd As Long, ByVal wMsg As Long, _
                 ByVal wParam As Long, lParam As Any) As Long

Private Const LB_FINDSTRING = &H18F

Private Sub Command1_Click()
Dim strText$

    'strText = Trim(Text1.Text)
    strText = InputBox("Prompt", "Title", "serach text")
    List1.ListIndex = SendMessage(List1.hwnd, LB_FINDSTRING, -1, ByVal strText)

End Sub

Private Sub Command2_Click()
    FindListItem "Item1", List1
End Sub


Private Sub Form_Load()

    List1.AddItem "curriculum vitae"
    List1.AddItem "firstname:     aaaa"
    List1.AddItem "lastname:     bbbb gg"
    List1.AddItem "code:345635"
    List1.AddItem "curriculum vitae"
    List1.AddItem "firstname:     gggg"
    List1.AddItem "lastname:     vvvv bb"
    List1.AddItem "code :123456"
    List1.AddItem "curriculum vitae"
    List1.AddItem "firstname:     fgf"
    List1.AddItem "lastname:     eee fd"
    List1.AddItem "code :44444"

End Sub

Private Sub Text1_Change()
Dim strText$

    strText = Trim(Text1.Text)
    List1.ListIndex = FindItem(List1, strText)

End Sub



Public Sub FindListItem(sText As String, lst As ListBox)
Dim i%

    For i = 0 To lst.ListCount - 1
        If LCase(Trim(lst.List(i))) = LCase(Trim(sText)) Then
            lst.ListIndex = i
            Exit For
        End If
    Next

End Sub

Public Function FindItem(lst As ListBox, strText As String) As Integer
Dim iIndex As Integer
    
    iIndex = SendMessage(lst.hwnd, LB_FINDSTRING, -1, ByVal strText)
    FindItem = iIndex

End Function

-David
2006 Microsoft Valueable Professional (MVP)
2006 Dell Certified System Professional (CSP)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top