Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Getting Started with a Perfect Hash

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
I have some very large files I need to do quick lookups on --like 60K records. The files are static -not updated. From what I've read a "perfect hash" is the fastest way to go. I'm new to the hash algorithm although I understand the basic concept but I am ignorant of the mechanics of it. Anyone know a good link to get started with writing hash algorithms? "Hashes for dummies"?
 
PHP can calculate md5 and SHA1 hashes be default. Even for files (md5_file and sha1_file functions). I suggest you just use one of those.

But I fail to see what hashes have to do with lookups or what the structure of those "records" in the file is. If these files are actually tables, a database might be a better option.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
I have just a few lists of first and last names for lookups. The lists have a single field, the name itself, and are not related to each other. SQLlite is a a little too much trouble for this simpler application.

From Wikipedia "Perfect Hash":

Using a perfect hash function is best in situations where there is a large set, S, which is not updated frequently, and many lookups into it.

 
60000 records is no sweat for a database. and sqlite would be a perfect solution for this kind of application.

but if you must use a flat file then i would say a hash is only going to be a valid timesaver if each line is quite long (more than 16 characters on average, 24 if you use sha1).

if each line is unique, then a strpos or similar might be much more efficient.

remember with modern computing, disk drives and memory mapping techniques, file i/o operations are stormingly fast. i would not be remotely surprised if you got great results using file() and in_array().
 
Even with a hash you will get quite a long value upto 24 bytes as jpaddie says you have to translate that into a physical address in a flat file which might be more trouble than its worth.
 
Indeed. And when reading SQLite's web site it says this is exactly what it is made for. I do not think this is overly complex.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top