I'm working on cleaning up some of the data our company has in its databases, and I need to quantify how similar two text fields are.
e.g. Database1 has a table with employee information, one of the columns has name stored as "Susan Smith"
Database2 has a table with employee information, one of the columns has a name stored as "Suzy Smith"
It is the same person, and failing all other methods to connect database1 record to database2 record, I need a way to find out the most likely record to database2 it is. A human will then go through and try to verify, but I need a starting point.
Essentially, I'm looking for some way for inputs of "George Lucas" and "George Lucaz" to return something similar to 91.66% match or "Lucas, George" returning 92.31%. Is there any such method to get this info?
--
"I'm not talking to myself, I'm just the only one who's listening." - JCS
e.g. Database1 has a table with employee information, one of the columns has name stored as "Susan Smith"
Database2 has a table with employee information, one of the columns has a name stored as "Suzy Smith"
It is the same person, and failing all other methods to connect database1 record to database2 record, I need a way to find out the most likely record to database2 it is. A human will then go through and try to verify, but I need a starting point.
Essentially, I'm looking for some way for inputs of "George Lucas" and "George Lucaz" to return something similar to 91.66% match or "Lucas, George" returning 92.31%. Is there any such method to get this info?
--
"I'm not talking to myself, I'm just the only one who's listening." - JCS