Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delete words that aren't in all caps 1

Status
Not open for further replies.

dodge20

MIS
Jan 15, 2003
1,048
US
I would like to delete all the words in a file that aren't in all caps. I would like to keep any numbers that are in the file also. How would I go about doing this?

Dodge20
 
I suppose I should mention what I have tried.
Code:
tr -d '[:lower:]' < input_file
But this only removes the lowercase letters I want to remove words that aren't entirely in caps. So Proper case words should be removed also.

I also tried
Code:
 sed -e "s/[a-z]\{2,\}//g"

But this did the same as above.

Dodge20
 
An interesting requirement... why do you need to do this out of curiosity?

Something like this perhaps:

[tt]$ echo 'But this only removes the lowercase letters I want to remove words that arent entirely in caps. So Proper case words should be removed also.' | perl -pe 's/\b *[[:lower:]]+\b//g'
But I. So Proper.
$[/tt]

Annihilannic.
 
The reason I need this is I have a huge file with a bunch of addresses in it along with a bunch of other junk. The 'other junk' is all in lower or proper case and i need to get rid of it. All of the addresses are in all Caps.

So, you script isn't quite right. From your input "I" would be the only word I would want since it is the only one in all caps.

example:

This is My Address
123 FAKE STREET
MYTOWN, MYSTATE 55555
More junk here
and here

The output should be
MYNAME 123 FAKE STREET
MYTOWN, MYSTATE 55555

There can be any number of lines of junk between the addresses.

Dodge20
 
Oh, sorry, misread the question. Can't you just grep -v '[[:lower:]]' to do that? i.e. remove any lines containing a lowercase character.

Annihilannic.
 
Or, with GNU sed:
Code:
$ sed -e 's/\b\w*[a-z]\w*\b//g' <<EOF
> This is My Address
> 123 FAKE STREET
> MYTOWN, MYSTATE 55555
> More junk here
> and here
> Junk Junk Junk CAPITAL Junk Junk
> EOF
   
123 FAKE STREET
MYTOWN, MYSTATE 55555
  
 
   CAPITAL
$
That'll preserve the all-capital words that appear on lines with lowercase characters.
 
More granular and special purpose (imho).
Code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>




int main(void) {
int p;

                while ( (p = getc(stdin)) != EOF) {
                      if (islower(p)) {continue;}
                      if (isspace(p) || ispunct(p) ||  p == '\n') { printf("%c",p);}
                      if (isdigit(p) || isupper(p)) {printf("%c",p);}
               }
return 0;
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top