Actually this is only a part of what I'm doing, first of all generate titles through first name association (having removed unisex names) and then use it a bit like a spellcheck, there are a lot of duplicate customers within this data and I'm currently consolidating them together. Duplicates have been identified by a matching company and a partial string match on the contact name so a list of appropriate names allows me to choose which of two similar but not exactly the same names to give precedence (there are a lot of spelling mistakes in this data as well).
I've never done a big de-dupe like this before so any general purpose hints or tips would be gratefully received