Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Separate name from different columns

Status
Not open for further replies.

TheLazyPig

Programmer
Sep 26, 2019
95
PH
Hi!

I have a set of names. Is it possible to separate it from different columns?

Untitled_gw6qre.png



Thank you!
 
This is the general form, not the most efficient, but probably the clearest as to intent:
Code:
SELECT MYTABLE
GO TO
DO WHILE .NOT. EOF()
	REPLACE LASTNAME 	WITH GETWORDNUM(NAME,1)
	REPLACE FIRSTNAME 	WITH GETWORDNUM(NAME,2)
	REPLACE MIDDLENAME 	WITH GETWORDNUM(NAME,3)
	REPLACE GENNAME 	WITH GETWORDNUM(NAME,4)
	SKIP
ENDDO

Simpler, less readable perhaps...
Code:
SELECT MYTABLE
REPLACE ALL LASTNAME WITH GETWORDNUM(NAME,1), FIRSTNAME WITH GETWORDNUM(NAME,2), MIDDLENAME WITH GETWORDNUM(NAME,3), GENNAME WITH GETWORDNUM(NAME,4)

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

There is no place like G28 X0 Y0 Z0
 
The challenge here is understanding where one part of the name ends and the next one begins. You're probably going to need a few tables to help you work that out and even with that, you'll need to do some of these by eyeball.

Somebody wrote a great article about this some years ago that included a set of tables that helped with American names. I can't remember who wrote it or where it appeared, but maybe someone else here will have their memory jogged by this.

Tamar
 
Somebody wrote a great article about this some years ago that included a set of tables that helped with American names. I can't remember who wrote it or where it appeared, but maybe someone else here will have their memory jogged by this.

I vaguely remember an article like that, Tamar. But I thought it was one that you wrote yourself. Or am I thinking of one you wrote about generating realistic test data? I seem to recall that you had a list of the top so-many girls' and boys' names from the US Census or something similar. Or maybe not.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
I had not spotted the DE and DE LA that maybe could be picked up
using the shortness (2 char) nature?

But the ROEL VERGEL type pair for the forename... that is a poser.

Could a combination of word count and shortness be used?

I worry that there probably are asian names like Oh that would spoil this
approach.



Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

There is no place like G28 X0 Y0 Z0
 
Mike

Are you suggesting the use of look up table(s) for the possible name pairs - based on a census or some such - would enable this to be
processed more easily?

So a table with (say) three entries with ROEL, and separate records for PHILIP, VERGEL, MANUEL and then scan each of the OPs
records looking for those pairs and accepting that as a forename (for example)?

Another table of GENNAME as well, JNR, SNR, II, III, IV etc...?

Would it be possible trigger that based on a word count > 4?

What would you do with our ex-PM though?

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

There is no place like G28 X0 Y0 Z0
 
Without additional tables of valid names, all you can do is separate the parts, but not assign them to the right name part.

If you can assume the full name data you have always starts with the last name, you still have the cases like DE LA PAZ. The only sure thing is GETWORDNUM(NAME,1) is either that last name or part of the last name. That's too little information to continue. Usually, if full names are stored as Last, First with a comma but your data doesn't even have that as a definite split position.

The largest set of name data I googled is here: The downside is that it comes from a Facebook data leak and using that data might not be ethical to do. The section "Composition" points to a newspaper article about this data breach.

Deciding whether or not to use this or other data sets, you have to invent some way to get to the name parts as you like to store them and it's not as easy as just getting the single parts separated by whitespace alone.

That said, if you have less than 100 names you surely have it easier to do this manually without getting into intricate details of determining name parts by comparing them with a database of names that are at least already split by lastname, firstname. But even the Facebook data leak won't have the middle names separated. I don't think. I know Facebook has a real name policy for the longest time, you could still join Facebook with a fantasy name, so even that aspect is still in this data: That it has non-valid names.

What you could start with is sorting out all the names that only have two parts to be lastname,firstname. And I guess that would cover more than 80% of the data and then it only matters what remains to be done manually.

Chriss
 
No, Griff, I wasn't suggesting that at all. The only reason I mentioned the census data was to jog my memory in relation to the article that Tamar mentioned. I can see why that would be misleading. Apoligies.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
No problem, Mike.

I don't see how you could split these examples up without some kind of look up for common double word names though.

If you take Mr Musk's childrens naming scheme into account it gets even worse!


Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

There is no place like G28 X0 Y0 Z0
 
I wrote an article about a framework for creating test data and some of the base data I supplied was from US Census data:
I have a vague sense that the name splitter might have come from Sue Cunningham, but I'm also thinking of Whil when I remember this, and it may have been someone else entirely. But I remember that it could handle more complex names (though now I'm wondering if that was only about formatting them right, not figuring out first, middle, last).

Tamar
 
A cautionary tale:

One of my first clients - this was in the early days of PCs when MailMerge was a novelty - wanted to send a personalised letter to each of his customers. So he wrote a little program in MBASIC to parse out the elements of each customer's name.

If the name consisted of exactly one word, he would treat that word as the last name, and address the letter as "Dear <title> + space + <last name>". For all other cases, he devised a scheme which he thought would identify the customer's first name, so that the letter could be addresses simply as "Dear + space + < first name>". He thought he was so clever.

One of his customer's was an official at the Saudi embassy in London. I don't remember that person's actual name, but it was something like Adel bin Salman Al Saud. Unfortunately, it was only after my client had mailed the letters that he realised that the person in question was addressed as "Dear Al".

The moral: Don't rely on an algorithm when dealing with something as sensitive as people's names.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Hi,

The moral: Don't rely on an algorithm when dealing with something as sensitive as people's names.

This is even more true when you can't determine the Last- and Firstname e.g. French names "Jean Michel" or "Ronald Pierre" or "Michel Jeannette" ...

MarK
 
Can you do that and show the result?

Code:
Select GETWORDCOUNT(namefield), Count(*) as NumRecords FROM thetable Group By 1

Chriss
 
Lazy Pig, it would be useful to know: (i) Is this a one-off requirement, or is to something you need to do regularly? and (ii) Roughly what volume of data is involved?

The reason I ask is that, if it is a one-off and the volume is not huge, I would favour doing it manually. As you can see by the replies in this thread, it is unlikely that an automatic solution would give you a correct result. At the very least, use the code given by Griff above, and then manually review the result and make any necessary adjustments.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Mike gives the reason Im asking you to do the query. It'll show the volume of the data and how many of them are the simple 2 word cases. If all names are like your sample data starting with the last name, you can at least solve that with getwordnum. Perhaps you already did use Griffs code and did the rest by hand anyway. It would be nice, if you get back here and tell about your progress or close the thread.

Mike has one point: No matter what volume the current data has, if it's a regularly occurring requirement and the new names you get are usually new names and not ones you already have split up before, then it would be good to program something able to identify lastnames, titles or "gennames", like roman numerals or JR. And there are tools and services online, you could use. Just one result from a google search:

It's a google spreadsheet add on to split a name column into parts. No idea how well it does so for your purpose, but you could give it a try or search for others.

Chriss
 
To make name-splitting more difficult, some people have four (or five) names. You may need to establish their country of origin to parse them accurately. Good luck.

Steve
 
DE GUZMAN JOSE ISAGANI BANGGAY JR has even 6 parts, so the sample data includes these hard cases.

In fact JOHNNY BASACA seems the only normal 2 part name. But I guess the sample data is handpicked to showcase these hard cases.

I'd really be interested in the statistics my query would show, because I guess my 80% estimate of 2 part names is on the low end - just thinking of the not always correct 80/20 rule. Most people have only two names - first and last. Maybe more than one thinks have a middle name, they just almost never use it.

It's also easy to detect and split off a standalone JR or I, II, III, but an I alone may also just be an initial of a name with a missing dot. And that's just the beginning of many problems you can have. Most commonly that first names can also be last names and vice versa. Well, that would be solved if all the full names start with the last name, but I'd not bet on it.

You could still split the 2 part names easily and cross-checking with a database of first and last names to find the most likely order of them unless you have a JAMES DEAN, DEAN MARTIN, or MARTIN JAMES and don't know whether they are DEAN JAMES, MARTIN DEAN, or JAMES MARTIN instead.

Chriss
 
Hi!

Thank you for all the replies. I really had a hard time on this, it will only be used when the .dbf file was given to the user maybe monthly? I already have a solution and it doesn't need to be separate anymore because the database has a full name column. Before I found out that it has a full name column, I tried to separate it first in Excel by using text-to-column but it didn't work out as I wanted.


:)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top