Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Name Parsing 2

Status
Not open for further replies.

EBOUGHEY

Programmer
Aug 20, 2002
143
0
0
US
I am trying to come up with some type of program that will parse multiple names to separate fields. The main issue is that there is no set format. Will I have to design some type of lookup table or do you think that CASE statements that evaluate the name field and perform the action will work? For instance, it removes companies first (lookup table we already have, looks for several formats and counts how many words each has, then processes?... (Below is a sample of the data)

"GRINDE, JAMES C; GRINDE, LINDA"
"GRINDE JAMES C & LINDA"
"Willis Corroon & Assoc."
"J. P. & R. B. Jordan III"
"Cecil & Frances Viverette Jr."
"Katherine M. & Jo Karen Lowman"
"Mr & Mrs Keith W. McLaurin"
"Don & Sara Casper"
"Morris & Venus Hendrix Jr."
"Donald & Jodi Lindsey III"
"Mary Ann & Robert McCoy Jr."
"Dr & Mrs Michael S. Patrick"
"Matthew Steven & Autumn Bullock Williams"
"Mr. & Mrs. David & Lu Thompson"
"Mr. & Mrs. Gary R. McNeill"
"W. Michael & Jill Scarbrough"
"John Douglas & Ann Lowe Vodicka"
 
You are probably going to have to code a udf to do what you want.

Here is one that does a much simple version of what you need but it should give you an idea of how to start.

procedure fixname
** procedure used to clean up free form names before they are sent to letters
LPARAMETER cname
LOCAL ncomma, retval
ncomma=AT(',',cname)
** look for commas to see if it is in last, first format
** if it is swap the name around
IF ncomma=0
retval=cname
ELSE
retval=ALLTRIM(RTRIM(SUBSTR(cname,ncomma+1))+' '+SUBSTR(cname,1,ncomma-1))
ENDIF
** make sure that if Dr. or professor appears in string it is moved to beginning
IF 'Dr.' $ retval
retval='Dr. '+ STRTRAN(retval,'Dr.','')
ENDIF
IF 'Professor' $ retval
retval='Professor '+ STRTRAN(retval,'Professor','')
ENDIF
IF 'Prof.' $ retval
retval='Prof. '+ STRTRAN(retval,'Prof.','')
ENDIF
** get the Jr. and Sr. to the end
IF 'Jr.' $ retval
retval=STRTRAN(retval,'Jr.','')+' Jr.'
ENDIF
IF 'Sr.' $ retval
retval=STRTRAN(retval,'Sr.','')+' Sr.'
ENDIF
** get rid of the commas
retval=STRTRAN(retval,' ',' ')
RETURN retval


Lots of luck, this kind of stuff is a real pain.
 
I'm starting to think now about gender coding too.

I can't put 'Jr' or 'III' in the woman's name. I will need a lookup table with thousands of names that distinguish whether they are male or female (Chris?). I can't change how the name comes out either. If it is Theresa and Don, I have to keep it in that order on the parse.

"Mary Ann & Robert McCoy Jr." for example

Any suggestions on that front?


 
Where is this data coming from?

Why was it not entered in separate fields to begin with?

Can this be fixed?

Gender coding is a total losing battle. With the number of foreign names you tend to run into these days, my experience is that people can't tell the difference much less a computer program a good deal of the time.

For example would you like to guess what gender Le Pi is
or if Le is the first or last name?

 
I am working on getting a name database with genders from a baby website (about 525,000 names).

These are non-profit organizations that have volunteers entering the data so we have no control over their input. It has been a battle to say the least.

I know that I won't get 100%, but if we could reduce the amount of names that have to be fixed manually it would save us hours on each file.

Elena
 
How are they entering the data? Can you provide them with some sort of application that would at least make it easy for them to format the names?

If not I wish you a lot of luck.
 
I haven't look at this code for a long time, but this is what I did way back then:

SCAN
cChange=(&tnName)
** First Parse the Jr and Sr outta there
** Jr or Sr ar the only suffix it will parse
sfxname = NULL
DO CASE
CASE AT("Jr. ",cChange)> 0
cChange=ALLT(STRTRAN(cChange,'Jr.', ' '))
cChange=ALLT(STRTRAN(cChange,',', ' '))
sfxname = 'Jr.'

CASE AT("Sr. ",cChange)> 0
cChange=ALLT(STRTRAN(cChange,'Sr.', ' '))
cChange=ALLT(STRTRAN(cChange,',', ' '))
sfxname = 'Sr.'
***** reset field value*******
cChange=(tnName)
ENDCASE
**** now parse the name
IF ATC(".",cChange) > 0
fname = LEFT(cChange,ATC(".",cChange)-2)
IF ATC(",",fname)> 0
fname = STRTRAN(PROP(fname),',', ' ')
ENDIF
mname = SUBSTR(UPPER(cChange),ATC(" ",cChange)+1,;
(ATC(" ",cChange,2)-(ATC(" ",cChange)+1)))
lname = ALLT(SUBST(cChange,RAT(" ",ALLT(cChange),1),40))
IF ATC(",",lname)> 0
lname = STRTRAN(lname,',', ' ')
ENDIF
ELSE
fname = LEFT(cChange,ATC(" ",cChange)-1)
IF ATC(",",fname)> 0
fname = STRTRAN(PROP(fname),',', ' ')
ENDIF
lname = ALLT(SUBST(cChange,RAT(" ",ALLT(cChange),1),40))
IF ATC(",",lname)> 0
lname = ALLT(STRTRAN(lname,',', ' '))
lname = STRTRAN(lname,' ','-')
ENDIF
mname = NULL
ENDIF
UPDATE &P SET FIRST = fname, MIDDLE = mname, LAST = lname, Suffix = sfxname WHERE RECNO() = lcRec
GOTO lcRec
ENDSCAN

You can add to this and the other reply you got and make it as you need.
 
I know a few years back when "data warehousing" was the craze that a lot of companies came out with toolsets to parse non-uniform data such as names and addresses. Might try a google on data warehousing name parsing software

Mike Pastore

Hats off to (Roy) Harper
 
You wouldn't believe just how inept those name parsing softwares are when it comes to multiple names....

Or maybe you would.

I listed on rentacoder so perhaps I'll have my own software to sell and in a few years, who knows?

Elena
 
I did this in a VB medical application last year.
I had to accept free form input into a post-operative
notes program which came from voice-recognition.

I'll have to review the code and convert the concepts to
VFP - which I've wanted to do for some time.

But, don't hold your breath...

I'll try to post the relevant points I used in the
aforementioned function.

I've also used the regular expression object that's
included with the O.S. It has a much more comprehensive
set of features for performing parsing functions.

If you are familiar with the Unix world, you'll know
the GREP command; which the regexp object borrows from.

See:
Darrell



'We all must do the hard bits so when we get bit we know where to bite' :)
 
I won't hold my breath, but if you have any input at all it will help tremendously.

Elena
 
I've got an old program that was written by Walt Kennamer (he was part of the original FoxPro development team), that does a pretty good job. If you'd like a copy of the .prg, a test .prg and test data zipped up [~10K], post an email address and I'll send it to you.

At least it's a great place to start - I've used it in a number of situations.

Rick


 
I'd love it if you could do that....

eboughey@cfl.rr.com

 
I'd like the program too please.
r.brioul@hccnet.nl

Rob
 
I know there seem to be a lot of people interested in name parsing. Out on one of those programmer sites, I go over 450 hits and more than 90 bids on doing this project.

 
I would like a copy of rgbean's program and data if possible. TIA.
 
Buffie,
I'd be happy to send them to you (I don't have any where to "post" them right now), so just provide an email address, and i send them - it's only ~12k zipped up. Feel free to 'obscure you email address so the page scrapping spammers can't easily read it. e.g rickNOSPAM AT mydomain DOT com.

Rick
 
I tend to agree with the comment that it is a losing battle. As an argument:

Carol Alt
Carol O'Connor

Which is male and which is female. You know because they are famous, but if you didn't know who they were, you wouldn't know that one is a female and one is a male.

Dana Delaney
Dana Carvey
Dana Plato
Dana Andrews

All are famous actors, some alive, some dead, but some are male, some are female. There is nothing that gives you a clue unless you know who they are.

And, what am I?

Dana
 
rgbean,

can I get a copy too? please send it to mlopez AT cosett DOT com DOT bo.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top