Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fastest way to check 50,000 records against a DB? 1

Status
Not open for further replies.

TheLe

Programmer
Aug 2, 2006
12
US
Hello, I have vb.net optimization question.

I am reading some 50,000+ records from a file, and each record contains a unique PersonID (numeric) along with other data.

Before I process the record, I need to check that personID against the database to make sure it exists in our system. If it does, then I proceed. If it does not, I go to the next record in the file.

Currently I am doing a SELECT statement for each record I have in the file. If I get a record, then I know it exists.

Is there a faster and/or a more efficient way to check and see if that PersonID exists in the database?

Assume a table called FOO with a single field called PersonID.

Thank you,

`Le
 
I ran some quick tests. It looks like feeding a pre-sorted list to a Dictionary and using the .ContainsValue method will return in what appears to be the same amount of time as an array with a binary search. My initial tests were only of 50,000 records though, but both returned a boolean value in 0 milliseconds. I'm bumping the list up to 500,000 to see if there is any performance difference there.

-Rick

VB.Net Forum forum796 forum855 ASP.NET Forum
[monkey]I believe in killer coding ninja monkeys.[monkey]
 
It appears that the Dictionary object sorts on .add. So each time you add an item to it, it re-indexes. The result of which means that the dictionary object is using a binary search method, so it should be just as efficient as a hand written binary search. And if you give it a pre-sorted list, it's data loading performance is on par (within a second or two) with loading straight to an array.

But... since the Dictionary object has to re-index each time you add an item, sending is an unsorted list is a massive performance hit (about 20 seconds per 500,000 records loaded)

Either way, when presented with the task of finding 1 item out of 1/2 a million sorted items, both the Binary search and Dictionary object will both do so in less than 1 millisecond. (on my Core 2 Duo 2.6GHz, w/ 1GB of memory, and tons of apps running) Given the simplicity of the Dictionary object, I would say go ahead and just use it instead of writing your own binary code. Thanks for the pointer Swi!

Just incase anyone is curious, here is the code I was using. It's not real pretty, but I wasn't planning to sink a whole lot of time making it fancy for ya ;)

Code:
#Region "Dict vs bin"
  Private Sub Button15_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button15.Click
    Debug.WriteLine("Dictionary, Sorted")
    Search("dict", 7747401, True)
    Search("dict", CInt(Rnd() * 10000000), True)
    Debug.WriteLine("")
    Debug.WriteLine("Dictionary, Unsorted")
    Search("dict", 7747401, False)
    Search("dict", CInt(Rnd() * 10000000), False)
    Debug.WriteLine("")
  End Sub

  Private Sub Button16_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button16.Click
    Debug.WriteLine("Binary, Sorted")
    Search("bin", 7747401, True)
    Search("bin", CInt(Rnd() * 10000000), True)
    Debug.WriteLine("")
  End Sub

  Private Sub Search(ByVal mode As String, ByVal value As Integer, ByVal sorted As Boolean)
    Dim dict As New Dictionary(Of Integer, Integer)
    If Not sorted And mode = "bin" Then Throw New Exception("Binary search MUST be sorted.")

    Dim FirstTic As Integer = Environment.TickCount

    If Not IO.File.Exists("C:\DictVsBinTest.xml") Then GenerateRandomData()
    Dim ds As New DataSet
    ds.ReadXml("C:\DictVsBinTest.xml")
    Dim drs() As DataRow
    If sorted Then
      drs = ds.Tables(0).Select("", "ID Asc")
    Else
      drs = ds.Tables(0).Select
    End If
    Dim IDs(drs.Length - 1) As Integer
    Dim i As Integer
    For Each dr As DataRow In drs
      If mode = "bin" Then
        IDs(i) = CInt(dr("ID"))
      Else
        dict.Add(i, CInt(dr("ID")))
      End If
      i += 1
    Next

    Dim result As Boolean
    Dim EndTic As Integer
    Dim StartTic As Integer = Environment.TickCount

    If mode = "bin" Then
      result = BinarySearch(value, IDs)
    Else
      result = dict.ContainsValue(value)
    End If

    EndTic = Environment.TickCount
    If result Then
      Debug.WriteLine(value & " Found in " & EndTic - StartTic & " ticks")
      Debug.WriteLine("Total Time: " & EndTic - FirstTic & " ticks")
    Else
      Debug.WriteLine(value & " Not found in " & EndTic - StartTic & " ticks")
      Debug.WriteLine("Total Time: " & EndTic - FirstTic & " ticks")
    End If
  End Sub

  Private Function BinarySearch(ByVal Value As Integer, ByVal Array As Integer()) As Boolean
    Dim ReturnValue As Boolean = False
    Dim Top As Integer = Array.Length - 1
    Dim Bottom As Integer = 0

    Dim CompValue As Integer
    Dim pot As Integer
    Do
      pot = ((Top - Bottom) \ 2) + Bottom
      CompValue = Array(pot)
      If Value > CompValue Then
        Bottom = pot + 1
      ElseIf Value < CompValue Then
        Top = pot
      Else
        ReturnValue = True
      End If
    Loop Until ReturnValue = True Or Top = Bottom
    Return ReturnValue
  End Function

  Private Sub GenerateRandomData()
    Dim ds As New DataSet
    Dim dt As New DataTable
    ds.Tables.Add(dt)
    dt.Columns.Add("ID", GetType(Integer))

    For i As Integer = 0 To 500000
      Dim ID As Integer = CInt(Rnd() * Integer.MaxValue)
      Dim dr As DataRow = dt.NewRow
      dr("ID") = ID
      dt.Rows.Add(dr)
    Next
    ds.WriteXml("C:\DictVsBinTest.xml", XmlWriteMode.WriteSchema)
  End Sub
#End Region

-Rick

VB.Net Forum forum796 forum855 ASP.NET Forum
[monkey]I believe in killer coding ninja monkeys.[monkey]
 
Thanks for taking the time to research the solution out ThatRickGuy. Have a star for your work.

Swi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top