I'm trying to come up with a regexp for sentence-casing a string (i.e. the first word of a sentence begins with a capital letter, then the rest of the sentence is lowercase).
So far I've come up with the following regexp:
This works well on the following two sentences:
But when I give it a sentence that doesn't end with punctuation, the regexp fails. eg "this sentence doesn't end with a symbol" doesn't run through the regexp at all and just becomes "this sentence doesn't end with a symbol" again.
I've tried a few other ideas, like adding a \b into the list of punctuation symbols in my regexp (\.|\?|\!|\b) and it works as far as detecting the word boundary, but when it's put back out a $1 it becomes a backspace, removing spaces between words. And on the normal sentences that passed the first regexp, they become worse too.
Any help would be appreciated!
So far I've come up with the following regexp:
Code:
s~\b(\w)(.*?)(\.|\?|\!)~\u$1\L$2$3~ig;
This works well on the following two sentences:
Code:
a lowercase sentence. does this have multiple types of punctuation? yes!
AN UPPERCASE SENTENCE! ONE WITH THREE PERIODS HERE... AND A QUESTION MARK? AND THEN ANOTHER SENTENCE.
----------
Modified String 1: A lowercase sentence. Does this have multiple types of punctuation? Yes!
Modified String 2: An uppercase sentence! One with three periods here... And a question mark? And then another sentence.
But when I give it a sentence that doesn't end with punctuation, the regexp fails. eg "this sentence doesn't end with a symbol" doesn't run through the regexp at all and just becomes "this sentence doesn't end with a symbol" again.
I've tried a few other ideas, like adding a \b into the list of punctuation symbols in my regexp (\.|\?|\!|\b) and it works as far as detecting the word boundary, but when it's put back out a $1 it becomes a backspace, removing spaces between words. And on the normal sentences that passed the first regexp, they become worse too.
Any help would be appreciated!