Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I use grep to filter "****"? 4

Status
Not open for further replies.

cpjust

Programmer
Sep 23, 2003
2,132
US
Hi,
I thought it would be as easy as:
Code:
cmd | grep -e "****"
But that just seems to print everything, as if I just ran cmd by itself. Grepping normal words works fine, but for some reason the asterisks mess it up?

Also, is there a way to pipe stderr to grep instead of stdout?
 
Single quotes will also keep the shell from expanding wildcards, so:

grep '****'

will also work.
 
Thanks.

Any ideas about piping stderr instead of stdout?
 
jgunkel said:
Single quotes will also keep the shell from expanding wildcards, so:

grep '****'

will also work.
This is incorrect.

The backslashes aren't meant to keep the shell from using the asterisks as part of a glob; the quoting already takes care of that. The backslashes are meant to keep grep from using the asterisks as part of a regular expression. Your suggestion is an invalid regular expression, and when (if) grep accepts it, it's only making a special case.

Your suggestion is also just plain different from the one with backslashes. Consider the difference in output given the same input:
Code:
$ echo -e '*****************\n*' |grep -o "\*\*\*\*"
****
****
****
****
$ echo -e '*****************\n*' |grep -o '****'
*****************
*
The first looks for a sequence of four asterisks. The second seems to be looking for any number of sequences of any number of sequences of any number of asterisks. Aside from being redundant, that pattern, in addition to matching sequences of four asterisks, matches sequences of more than four asterisks and sequences of less than four asterisks, including sequences of zero asterisks. Thus:
Code:
$ grep -c "\*\*\*\*" /usr/share/dict/words 
0
$ grep -c '****' /usr/share/dict/words 
98569
The non-backslashed pattern matches every word in the dictionary. It will match every line you give it.

Long story short: put the backslashes back.
 
jgunkel:
I suppose the bigger issue with your suggestion concerns the quotes. Glob wildcards are not expanded by the shell in either single- or double-quoted strings. [tt]'****'[/tt] and [tt]"****"[/tt] both result in the exact same string being seen by grep, so your command would result in the exact same behavior as that given in the original post.
 
chipper,

Serves me right for posting that at the end of my day here after coming to work with a cold. Have a star for posting a useful and non-inflamitory response :)

Alrighty then...

The two reasons for your grep statement outputs four lines of "****" is (1)the "-o", where it prints exactly what it matches, and it just happened to match four times in that string of asterixes, and (2)your regex (nor mine) does not specify that the fifth character must not be an asterix.

Not sure if that is even a requirement of the original, but I guess I feel like indulging in some regex masochisim today. So:

Code:
echo -e 'a\n23423\n****\n***\nfoo\n********\nkjh\n**** bar\nlog****err or\n' | grep -E '[\*]{4}[^\*]'

Will only output the lines:
Code:
**** bar
log****err or
while ignoring the "****" line.

I hope I have redeemed myself!

BTW: cpjust, are you by chance parsing captured output from a nortel phone switch? By an odd coincidence, "****" is the escape sequence to exit any of the maintenance programs, and go back to the main command processor (load 0)

Cheers!
 
I wasn't suggesting that there was a requirement that the fifth character not be an asterisk. The point in using the long sequence of asterisks was only to show that the "****" pattern is greedy and matches as many asterisks as are available, while the "\*\*\*\*" pattern takes exactly 4 at a time.

But yes, your regex would be a good way to enforce such a requirement. Except you may want to get rid of a couple backslashes this time... ;-):
Code:
$ cat >test
****
****\
\\\\.
\*\*.
$ grep -E '[\*]{4}[^\*]' test
\\\\.
\*\*.
 
Well I was totally lost for most of that stuff... ;-)
But using "\*\*\*" works fine for me. Thanks.
I'm using to only print lines from my Makefiles that I echoed (which I always start with 3 or 4 asterisks).
 
Smeg!

That's not supposed to happen!

(/me furiously types snippet directly into my shell)

(grumble)

You are exactly correct. In that case it should be

Code:
grep -E '[*]{4}[^*]' test

Thank you sir! You have reminded me of why I switched to perl for most of my scripting needs. Or maybe I can blame it on having to switch between bash on Linux and ksh on HP-UX so often in the past while... (yeah that's it...) [smile]

cpjust,

If you are certain that the stars will always be at the beginning of the line, you can add change the pattern one more time, to tell grep to match the beginning of the line with a "^". So:

Code:
grep -E '^[*]{4}[^*]' test

Will match four stars at the beginning of the line only, followed by something that is not a star.

And $DEIETY help me... if I got that one wrong, I am shredding my certs and starting over

Cheers!
 
Shred your certs first. :)
Using a glue language instead of various shell dialects has been a long standing ideal, and an abused one that rarely works (imho) without strong authority and standards in project.

I've seen 4 shell flavors, two glue languages (perl and tcl) and then seen the strengths of these glue languages (tcl == tk easy gui kit -w- OO looks and some neat list/data structure processing), perl == swiss army knife for the terminally lazy), used not as a replacements but for 'features and enhancement'.

Anyway exec'ing things like grep and umpteen other binaries for string processing should be passe. Pick a shell with all the functionality you need (zsh anyone), standardize and be happy.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top