Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sentence-Casing 1

Status
Not open for further replies.

Kirsle

Programmer
Jan 21, 2006
1,179
US
I'm trying to come up with a regexp for sentence-casing a string (i.e. the first word of a sentence begins with a capital letter, then the rest of the sentence is lowercase).

So far I've come up with the following regexp:
Code:
s~\b(\w)(.*?)(\.|\?|\!)~\u$1\L$2$3~ig;

This works well on the following two sentences:
Code:
a lowercase sentence. does this have multiple types of punctuation? yes!

AN UPPERCASE SENTENCE! ONE WITH THREE PERIODS HERE... AND A QUESTION MARK? AND THEN ANOTHER SENTENCE.

----------

Modified String 1: A lowercase sentence. Does this have multiple types of punctuation? Yes!

Modified String 2: An uppercase sentence! One with three periods here... And a question mark? And then another sentence.

But when I give it a sentence that doesn't end with punctuation, the regexp fails. eg "this sentence doesn't end with a symbol" doesn't run through the regexp at all and just becomes "this sentence doesn't end with a symbol" again.

I've tried a few other ideas, like adding a \b into the list of punctuation symbols in my regexp (\.|\?|\!|\b) and it works as far as detecting the word boundary, but when it's put back out a $1 it becomes a backspace, removing spaces between words. And on the normal sentences that passed the first regexp, they become worse too.

Any help would be appreciated!
 
Hi, Kirsle

Perhaps this can help:
Code:
$var =~ s/\b(\w)(.*?)(\.|\?|\!)/\u$1\L$2$3/ig;
$var =~ s/\b(\w)(.*?)(\.|\?|\![red][b]|$[/b][/red])/\u$1\L$2$3/ig;

;-)
 
Yeah, I was originally using ucfirst in my code but that didn't always work very well unless there was only one sentence being passed in. A regexp was a better way to go in this case.

Here's the old code I was using:
Code:
sub stringUtil {
	my ($self,$type,$string) = @_;

	if ($type eq 'uppercase') {
		return uc($string);
	}
	elsif ($type eq 'lowercase') {
		return lc($string);
	}
	elsif ($type eq 'sentence') {
		$string = lc($string);
		return ucfirst($string);
	}
	elsif ($type eq 'formal') {
		$string = lc($string);
		my @words = split(/ /, $string);
		my @out = ();
		foreach my $word (@words) {
			push (@out, ucfirst($word));
		}
		return join (" ", @out);
	}
	else {
		return $string;
	}
}

Both "formal" and "sentence" can be recoded using regexp's (I already know how to formalize words now, just ~\b(\w+)\b~ucfirst(lc($1))~)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top