Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

hash tables?

Status
Not open for further replies.

anatazi

Programmer
Jun 24, 2002
33
0
0
US
Hi all,
can someone please explain to me how hash tables work and how they provide fast random access?
Thanks a lot.
 
Assume you have an array of values. You need to find a particular value, and you have the key where the value can be found, but you do not know where it is stored in the array. An example of this is the symbol table a c-language compiler builds as it is compiling your code -- when it comes across a variable name, it needs to know whether it had been previously defined to know whether it can use the variable in the code it generates or whether it should produce an error.

The question is, how do you find the one you need? You could brute-force search the array, comparing each key to the one you are looking for.

But suppose you have 10000 values. This will take a significant amount of time, and will require a significant number of comparisons.

Enter the hash function. It performs a quick calculation based on the name of the key, which then returns a number. Its whole purpose for existence is to reduce the number of comparisions your code must make to find the data it needs.

If you used that same function when deciding where in the array to store the key and value in the first place, then this calculation will take you right to where you need to go to find the data again.

Your hashing algorithm also needs to be able to handle collisions -- those situations where two keys map to the same array index. At that point, you can use probing, a second function to quickly calculate how many array elements to skip before trying to store the new key again (or to find the key on retrieval). Another way is to make each element of the array a linked list -- your code would then traverse the list to determine which key it needs.

Is this what you wanted to know? ______________________________________________________________________
TANSTAAFL!
 
ok,
So basically they provide faster access to data because they do less comparisons. so istead of a long list you have several short lists. Ok I'll just need to practice using them. Thanks for the explanation.
 
It's not just that they reduce the number of comparisons. It's that, in a perfect world, without collisions, a hash function should enable you to find something with NO comparisons.

As a simple example, consider that you want to store numbers that range from 1 to 1,000,000 in a table. Each number can be in the table either zero or one times.

If you kept a regular array of the numbers in the table, you'd have to do a search each time you want to check whether or not a number is in the table. That wouldn't be very fast.

A hash table to solve this problem would be an array of size 1,000,000. Each element would be used as a boolean value. With this implementation, if you wanted to add 7,309 to the hash table, you'd go directly to table[7309] and change it to nonzero. If you want to check whether or not 232,531 is in the table, you'd go directly to table[232531] and see if the value contained there is nonzero.

Real hash tables are based on this principle. You find some way to change an item into an index into an array. With integers, it was easy because they're already indexes. With strings, it's a bit harder. You can't make a table big enough to hold every combination, so you use a hash function to generate a semi-unique index for each string.

Semi-unique. That means sometimes, you'll hash two different strings to the same value. That's a collision. That's where hash tables stop being perfect. You now have to check the item at that index and see if it's really the one you're looking for. If not, it might be somewhere else in the list, and you'll have to check elsewhere (depending on the table's policy for handling collisions) to find it or satisfy yourself it's not in the table. Generally, this lookup is a lot faster than a plain linear search, since you actually have some clue about where to start the search.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top