Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Convert STRINGS to Proper Case. 2

Status
Not open for further replies.

motoslide

MIS
Oct 30, 2002
764
US
Is there a utility (similar to tr) that can be used to convert strings within a text file to Proper Case?

Before:
ACME HARDWARE|1234 YOUR STREET|ANYTOWN|BOB SMITH|(555) 555-1212

After:
Acme Hardware|1234 Your Street|Anytown|Bob Smith|(555) 555-1212

I'm sure there will be plenty of issues where the conversion would generate unwanted results (such as "PT CRUISER" becomes "Pt Cruiser" and "RYAN MCHENRY" becomes "Ryan Mchenry"). We can determine later if those issues are too difficult to live with.

"Proof that there is intelligent life in Oregon. Well, Life anyway.
 
Code:
ruby -e 'ARGF.each{|s| print s.split(/([A-Z]+)/).
map{|x| x.capitalize}}' textfile
 
If this is headed in or out of Oracle, use the INITCAP function.

Here's a sed script:

Code:
s/^/ /
:loop
h
s/.*[^a-zA-Z0-9][A-Z]\([A-Z]\{1,\}\).*/\1/
/[^A-Z]/{
        s/^[^a-zA-Z0-9]\(.*\)/\1/
        b
}
y/ABCDEFGHIJKLMNOPQRTSUVWXYZ/abcdefghijklmnopqrtsuvwxyz/
G
s/\(.*\)\n\(.*[^a-zA-Z0-9][A-Z]\)[A-Z]\{1,\}/\2\1/
t loop

Cheers,
ND [smile]
 
If you have GNU sed:
Code:
$ sed 's/\b\(\w*\)\b/\L\u\1/g'
ACME HARDWARE|1234 YOUR STREET|ANYTOWN|BOB SMITH|(555) 555-1212
Acme Hardware|1234 Your Street|Anytown|Bob Smith|(555) 555-1212
 
Thanks much, folks. I'll give the sed methods a try and see what comes out. I won't have access to the test data for a few days.
I don't have ruby, but that looks like a very clean solution.

"Proof that there is intelligent life in Oregon. Well, Life anyway.
 
I seldom use perl but this should be clean and easy:

perl -n -e "s/\b(\w+)/\L\u$&/g;print"

Cheers,
ND [smile]
 
perl -pe 's/\b(\w+)/\L\u$&/g'
Seems to work only for 7bits ASCII files, not for iso8859-1 accented stuff (as we all use in real world (ie outside USA ;-))).
 
Hi

Yes, I saw that. ( My usual test file is a lasagna recipe in hungarian. :p )

But anyway, neither the other scripts works well with such texts. ( At least using the versions I have. )

Feherke.
 
What about contractions in English. This is also trouble.

There's always C. The ascii stuff would be easy with good performance then you have to contend with the specific cases, contractions, and such.

Est-ce que j'ai raison?

Cheers,
ND [smile]
 
Both the PERL and SED commands worked as expected. Thanks much. I even tried with a hyphenated last name, which was successful.

I gave BullDog the star because his sed script confused me but worked great.

"Proof that there is intelligent life in Oregon. Well, Life anyway.
 
In case of possesive company names, I hacked this in:

Code:
s/^/ /
:loop
h
s/.*[^a-zA-Z0-9][A-Z]\([A-Z]\{1,\}\).*/\1/
/[^A-Z]/{
        s/^[^a-zA-Z0-9]\(.*\)/\1/
        s/'S\([| ]\)/'s\1/g
b
}
y/ABCDEFGHIJKLMNOPQRTSUVWXYZ/abcdefghijklmnopqrtsuvwxyz/
G
s/\(.*\)\n\(.*[^a-zA-Z0-9][A-Z]\)[A-Z]\{1,\}/\2\1/
t loop

Now

ACME HARDWARE|1234 YOUR STREET|ANYTOWN|BOB SMITH|(555) 555-1212
SAM'S CLUB|1234 YOUR STREET|ANYTOWN|BOB SMITH-BRADLEY JR|(555) 555-1212
FRANK'S|1234 YOUR STREET|ANYTOWN|CARMEN D'SOUZA|(555) 555-1212

becomes

Acme Hardware|1234 Your Street|Anytown|Bob Smith|(555) 555-1212
Sam's Club|1234 Your Street|Anytown|Bob Smith-Bradley Jr|(555) 555-1212
Frank's|1234 Your Street|Anytown|Carmen D'Souza|(555) 555-1212

Cheers,
ND [smile]
 
Code:
ACME HARDWARE|1234 YOUR STREET|ANYTOWN|BOB SMITH|(555) 555-1212
SAM'S CLUB|1234 YOUR STREET|ANYTOWN|BOB SMITH-BRADLEY JR|(555) 555-1212
FRANK'S|1234 YOUR STREET|ANYTOWN|CARMEN D'SOUZA|(555) 555-1212
Code:
ruby -pe 'gsub(/\w+('\w(?!\w))?/){$&.capitalize}'
Output:
Code:
Acme Hardware|1234 Your Street|Anytown|Bob Smith|(555) 555-1212
Sam's Club|1234 Your Street|Anytown|Bob Smith-Bradley Jr|(555) 555-1212
Frank's|1234 Your Street|Anytown|Carmen D'Souza|(555) 555-1212
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top