Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

grep -w and the effects of special characters

Status
Not open for further replies.

cptk

Technical User
Mar 18, 2003
305
0
0
US
Using the -w option doesn't necessarily return a whole word.
For example, if I run /usr/xpg4/bin/grep -w "bird" myfile,

"birdie" will not be returned, but any of the following will:

bird#
bird@
bird$
..etc.

This will cure it, at the cost of longer execution time.
grep "(^|[[:space:]])bird($|[[:space:]])" myfile


Why doesn't -w work as expected???
 
It does work as expected. It just doesn't work as you expect it to. :)

Code:
-w, --word-regexp
       Select  only  those  lines  containing  matches  that form whole
       words.  The test is that the matching substring must  either  be
       at  the  beginning  of  the line, or preceded by a [b]non-word con-
       stituent character[/b].  Similarly, it must be either at the end  of
       the line or followed by a non-word constituent character.  [b]Word-
       constituent characters are letters, digits, and the  underscore[/b].

Annihilannic.
 
Annihilannic -

You are in deed right - as I would expect it (lol)...

I'm just amazed that the definition of word-constituent characters includes only the following 3 items:
letters, digits, and the underscore.

Why not have available another word-like option switch to be defined as between white space (also accounting for BEG/END of a line) and which would exclude the other 31 keyboard chars?

 
I guess because not enough people think it would be a frequently used feature - that's why the regular expression language exists, to give you flexibility.

I don't see why you're amazed... I certainly wouldn't consider those characters to be parts of any word. In fact, numbers and underscores are already stretching the definition!



Annihilannic.
 
D3P3|\|D5 \/\/|-|4T L4|\|G|_|4G3 Y0|_| 4R3 T4|_K||\|G (or trying to talk...)

HTH,

p5wizard
 
p5 -

What the hell is that?
 
No thanks ... I'll stick to just coding ...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top