Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

grep problem using -w option

Status
Not open for further replies.

cptk

Technical User
Mar 18, 2003
305
US
Say I have ....
> echo "cat dog" | grep -w "dog"
this returns the line "cat dog" as expected.

and say I have ...
> echo "catdog" | grep -w "dog"
this doesn't returns the line "catdog" as expected.

Now, say I have ...
> echo "cat.dog." | grep -w "dog"
this does returns the line "cat.dog." , but I thought it should not.

What's up with the grep recognizing the dot (i.e. - ".")?

I'm on solaris 9...
 
from man grep

-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, [red]or preceded by a non-word constituent character.[/red] Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
 
What constitutes a "word" is a group of letters surrounded by white space - space, tab, or newline. Therefore, IMO, cat.dog. is a word.

 

-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, [blue]or preceded by a non-word constituent character.[/blue] Similarly, it must be either at the end of the line or followed by a non-word constituent character. [red] Word-constituent characters are letters, digits, and the underscore.[/red]

Dot character is not a "Word-constituent".
 
The dot is a punctuation character and thus a word delimiter.
 
OK, so it appears a "." is a word delimiter/word-constituent... I don't see a way to override this within grep.. is there a way? If not, I'll have to use some awk | sed cmd...
 
." is a word delimiter, NOT constituent, word-constituent characters are letters, digits, and the underscore.

What do you need to obtain? what's the requirement?

Cheers.
 
It's resolved ...

I switched over to "egrep" and used the following to get a "whole word" match and not have the "dot" interfere with my intended result.

>echo "cat.dog" | egrep "(^| |\t)dog($| |\t)"


This basically says find "dog" with prefix and suffix containg any of the following:
1.) "dog" at beginning of line or end of line
2.) blank space immediately before or after "dog"
3.) tab immediately before or after "dog"

NOTE: how can I use the expression "\t" to represent a tab in an egrep? So far, I can't seem to find the proper escape sequence and thus am forced to use in its' place the actual "tabing" key which is not visually appealing (too many white spaces)!


...thanks "lightbulbhead"






 
try:

echo "cat.dog" | egrep "(^|[:blank:])dog($|[:blank:])"

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
My final note (at least I think so) ...

vgersh99's [:blank:] should actually be [:space:] and (here's the kicker for me) you need to be using the POSIX version of egrep, which for solaris 9 is in /usr/xpg4/bin/egrep.

so the resulting cmd is ...
echo "cat.dog" | /usr/xpg4/bin/egrep "(^|[[:blank:]])dog($|[[:blank:]])"

NOTE: the initial brackets are part of the shorthand definition, thus the entire shorthand needs to be in brackets.

Thanks vgersh99 for turning me on to shorthands for 'class of characters' !!!


the other shorthands:
[:alnum:] Printable characters (includes whitespace)
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Printable and visible (non-space) characters
[:lower:] Lowercase characters
[:print:] Alphanumeric characters
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits



 
oops,

echo "cat.dog" | /usr/xpg4/bin/egrep "(^|[[:space:]])dog($|[[:space:]])"

...now that was my last note!!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top