Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simple Reg. Ex. question regarding * vx +

Status
Not open for further replies.

DJR999

Programmer
Nov 19, 2003
16
US
Simple Regular Expression Question

OK - I'm new to UNIX and reg. ex, so this is probably very basic, but I can't figure it out. Given a simple text file called namelist with a list of names such as

John Doe
Bob Cat
Billy Bob
Joe Mama

etc,

the command:

grep '[A-Z][a-z]*' namelist

yields all lines, as it should, an upper case letter followed by 0 or more lower case letters.

yet

grep '[A-Z][a-z]+' namelist

yields nothing. Every line starts with one upper case letter followed by one or more lower case, but this yields nothing.

Also

grep '[A-Z][a-z]\{1,\}' namelist

yields every name on the list. I've found a couple of references that say this is identical to the 2nd grep command above (one ore more lower case letters), yet this lists the names, as I would expect, but the [a-z]+ syntax does not.

Can someone explain this to me?

Thank You,

Doug
 
Hi

Both [tt]+[/tt] and [tt]{}[/tt] has to be escaped in basic regular expressions. Or use extended or Perl-like regular expressions.
Code:
grep '[A-Z][a-z]\+' namelist
grep -E '[A-Z][a-z]+' namelist
grep -P '[A-Z][a-z]+' namelist
Tested with GNU [tt]grep[/tt].


Feherke.
 
Hmm.

Well - I guess it must just be a version thing.

Escaping the + does nothing - your first line above still yields no results.

Adding the extended -E option works - when I used your second line I get all of the names listed.

The -P option is not recognized - tells me illegal option.

I am on HP Unix - not sure of the exact implementation. Here is the welcome stuff I get after logging on:

Thanks for the reply.

(c)Copyright 1983-2003 Hewlett-Packard Development Company, L.P.
(c)Copyright 1979, 1980, 1983, 1985-1993 The Regents of the Univ. of California
(c)Copyright 1980, 1984, 1986 Novell, Inc.
(c)Copyright 1986-2000 Sun Microsystems, Inc.
(c)Copyright 1985, 1986, 1988 Massachusetts Institute of Technology
(c)Copyright 1989-1993 The Open Software Foundation, Inc.
(c)Copyright 1990 Motorola, Inc.
(c)Copyright 1990, 1991, 1992 Cornell University
(c)Copyright 1989-1991 The University of Maryland
(c)Copyright 1988 Carnegie Mellon University
(c)Copyright 1991-2003 Mentat Inc.
(c)Copyright 1996 Morning Star Technologies, Inc.
(c)Copyright 1996 Progressive Systems, Inc.


RESTRICTED RIGHTS LEGEND
Use, duplication, or disclosure by the U.S. Government is subject to
restrictions as set forth in sub-paragraph (c)(1)(ii) of the Rights in
Technical Data and Computer Software clause in DFARS 252.227-7013.


Hewlett-Packard Company
3000 Hanover Street
Palo Alto, CA 94304 U.S.A.

Rights for non-DOD U.S. Government Departments and Agencies are as set
forth in FAR 52.227-19(c)(1,2).
 
Hi

Yepp, that is why I specified that I tested my suggestions with GNU [tt]grep[/tt]. Honestly, I am surprised that -E worked for you.

Some tools on Unix not handle the [tt]+[/tt] quantifier, so you can see workarounds like this :
Code:
grep '[A-Z][a-z][a-z]*' namelist

Feherke.
 
man egrep

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top