Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fuzzy Matching

Status
Not open for further replies.

numina

Programmer
Jan 28, 2002
12
US
My colleague posted a question about fuzzy logic (quoted message below). I am not familiar with the subject, so I would like to refer to the experts on Tek-Tips.

“I am looking for a method of determining fuzzy matches, and hoping you have a suggestion for me. I am not referring to fuzzy matches in an internet search application. This is for a database application. We have a requirement to evaluate pairs of columns and determine if they are "close" to a match. I am thinking of an algorithm that will return a score on how similar the columns are. We would set a threshold on the score and determine if it was close enough to call it a match.

This is going to run against a table with perhaps 30,000 rows. The columns are VARCHAR, but they contain all numeric digits so a numeric algorithm would work.

Do you have any resources, experience, or leads on this?”

I would greatly appreciate your help

Numina
 
Use the SOUNDEX function "Helping others to help themselves..."
=================================
Thomas V. Flaherty Jr.
Birch Hill Technology Group, Inc.
 
Did you find this function? I have used this for fuzzy matching in the past with success.

Just curious... "Helping others to help themselves..."
=================================
Thomas V. Flaherty Jr.
Birch Hill Technology Group, Inc.
 
My colleague is still looking for other options. This was his response:

"Thanks! These are good options. I think these functions are part of the "Oracle Text" database "cartridge" of add-on functions. To use them the "Oracle Text cartridge" software has to be installed and activated. It takes more overhead, because they start special servers that build special indexes. I am not eager to do that."
 
No, SOUNDEX is a standard SQL Character Function, not part of another module.

Not sure if he knows what the SOUNDEX method is, but it converts a string into its phonetic representation. It allows you to compare words that are spelled differently but sound alike in English.

People use it a lot in Genealogy to find relatives that have slightly changed their name or spelled it diferently.

SOUNDEX syntax can be found on this page:



"Helping others to help themselves..."
=================================
Thomas V. Flaherty Jr.
Birch Hill Technology Group, Inc.
 

The columns that are being tested for matches are all integers. So SOUNDEX is dismissed for that reason (I tried it on characters and worked like a charm). Any thoughts on how this can be done for numbers?

Thanks.

Numina.
 
I think you two are talking about different things. Soundex is, indeed, a standard part of Oracle. Unfortunately it won't work for what Numina wants to do. Soundex only distinguishes similar alpha strings, and Numina is working with numeric digits.

Oracle Text, however, does contain a fuzzy match capability that may do the job. I only know about it based on the documentation. As far as I know, it's not included in the basic Oracle 8i package, but it is supposed to be a standard part of 9i. If you've upgraded or are in the the process of upgrading, it may be your solution.

I have also thought about how to write a home-grown function to perform this compare. You could gradually replace more and more characters in one string with '%' and compare it to the other until you get a match. The number of substitutions before a match was found would be a measure of how similar the strings are.

It sounds like an intriguing project to write such a function, but of course you shouldn't take this approach if Oracle makes the same functionality available through software.

 
Sorry about that - I missed the fact that it was integers and not character strings. "Helping others to help themselves..."
=================================
Thomas V. Flaherty Jr.
Birch Hill Technology Group, Inc.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top