Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to uppercase character within string 1

Status
Not open for further replies.

pvollma

Programmer
Feb 21, 2005
2
US
I used a pattern to uppercase the first letter of each word, and then lowercase the rest, in a name and address file. This works fine, but, because these are names, there are special cases where I want a specific character within a word to be uppercase, such as the letter following a word that begins with "Mc". For example, if the name was "MCDONALD" in the original format, my first process changed this to "Mcdonald". What regex replacement can I use to search for words starting with "Mc" and then uppercase only the character after the "c"? Less importantly, a few of the names were input with a space between the "Mc" and the rest of the name, e.g., "MC DONALD". Is there a replacement pattern I can use to reliably merge them? Thanks!
 
How about:
[tt]
$name=ucfirst($name);
$name=~s/^Mc /Mc/;
$name=~s/^Mc(.)/Mc${\(uc $1)}/;
[/tt]
 
This is probably a little easier to modify in case you need to add something other than 'Mc' to the list.

Code:
my @names = ('McDONALD', 'mac donald');
my @prefixes = ('Mc', 'Mac');

@names = map { "\L$_"} @names;    # LC the string
@names = map { "\u$_"} @names;    # UC first character

foreach my $prefix (@prefixes) {
    foreach (@names) {
        s/($prefix\s*)(\w)/$1\u$2/;
    }
}
 
Whenever you are doing this kind of cleaning you have to make a trade off between the false positives and the false negatives. It's probably a good idea to put all the Mcs and Macs into a hash and print them out afterwards to get a list of unique names and counts. Then you can scan this list by eye to see if anything looks weird, and make a call on which is worse based on the counts. Had to do this once on 8 million customers. We got all the Mcs and Macs working perfectly, then discovered that we had created a 'MacHine Tools Ltd.' for a business customer...[smile]
 
Since I was dealing with a line of text, with the Mc anywhere within the line, I used TonyGroves' idea with this modification:

Code:
$lineout =~ s/\bMc(.)/Mc${\(uc $1)}/g;

That worked just fine, and I checked through the text and found no anomolies. Thanks, Tony.
 
Code:
$lineout =~ s/\bMc(.)/Mc${\(uc $1)}/g;
Interesting stuff on the RHS. You're passing $1 to the uc function, taking a reference to the string returned (\), dereferencing it as a scalar ($), and then concatting it to 'Mc'. Can't say I've ever seen that in an re before. I'm surprised it works without the /e modifier, but I guess the parens around (uc $1) force the function call to take place before the rest is eval'ed.

I think the following would do the same thing. (A little less complicated, maybe.)
Code:
$lineout =~ s/\b(Mc)(.)/$1.uc($2)/eg;


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top