Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Numeric Comparison for a Fast Dictionary Check 2

Status
Not open for further replies.

sen5241b

IS-IT--Management
Sep 27, 2007
199
US
I want to check text against a custom dictionary. There's a lot of free word lists available for download and I have a particularly long one --109,583 words. To compare every word in a given chunk of text with every word in the dictionary is prohibitively too CPU or disk I/O intensive.

My plan is to store and load the word list as HEX (PHP bin2hex function?), binary or possibly floating point data*, convert the chunk of text to the same and do a faster numeric comparison. I'm pretty sure most spell checkers do binary comparisons against dictionaries.

* - Floating point registers are the fastest.
 
You could use a what is called a full sort merge in the database world.
Get you really long string and sort it into word order, depending on what you doing you might want to store an offset into the string for each word.
sort the dictionary as well and you can then run down the sorted string and the sorted dictionarty looking for matches. Because each list is sorted toy don't have to search within any strings and you dont have to scan the dictionary multiple times.
If you need to highlight a word (not present for example) you use the offset into the original string.
I've never done this but It should be a simple task once you get your head round it !
 
Yes, aspell uses offsets. See:
My concern though here is speed. Sorting words to be checked and running thru the dictionary once definitely saves time.

But I'm thinking of storing the whole dictionary in a binary format and then converting words to be checked to binary and then doing a completely binary comparison. Isn't a plain old textual comparison slow? Aren't numeric or binary comparisons faster? The Aspell dictionary is compiled with a MAKE command. Why do they do this?

(BTW, I can't use aspell, I need a custom dictionary).
 
i thought with aspell you could add your own dictionaries.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top