Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simple RegEx question

Status
Not open for further replies.

DJR999

Programmer
Nov 19, 2003
16
0
0
US
Hello - Sorry, I'm new at UNIX and regex and even the simplest of issues seem to be throwing me.

Can someone explain this to me:

In a file f1 this line is contained:

MH76M*ME5G4GCKOVICHJEFFREY0187405......

The command

grep MH76M*M f1

yields the line, as I would expect. I read this as MH76 followed by 0 or more 'M's, then another M.

However, if I use

grep MH76M*ME f1

no line is found. How can this be?

Also, if I use

grep MH76M\*M f1
grep MH76M\*ME f1

or

grep 'MH76M*M' f1
grep 'MH76M*ME' f1


I get the same results, in both pairs, the first command yields the line, the second one does not. I thought the backslash or the ticks should be escaping the asterisk and looking for a literal asterisk in the file, so both of these should work as well, as there is an asterisk in that line in the file immediately following the '76M'.

Thanks.
 
Hi

First of all, is a good idea to always use quotes around the strings to stop the shell expanding it. In this case single quotes will be fine.

Let us see how the regular expression matches the input string :
Code:
input : [red]MH76[/red][blue]M[/blue]*ME5G4GCKOVICHJEFFREY0187405......

regex : [red]MH76[/red][green]M*[/green][blue]M[/blue]
As you can see, one more "E" character in the regular expression will make it to not match, because in the input string an asterisk ( * ) character is the next, not an E

Note that the asterisk character is a multiplier in regular expressions, but is just a character like all others in the input string.

So, how exactly the input sting looks ? Has asterisk or not ? If has, your 3[sup]rd[/sup] and 4[sup]th[/sup] regular expressions are correct and they work for me with GNU [tt]grep[/tt] :
Code:
grep 'MH76M\*M' f1

grep 'MH76M\*ME' f1
Or if not really needed, just give up with using proper regular expressions ( if your [tt]grep[/tt] implementation has -F or --fixed-strings switch, or if you have [tt]fgrep[/tt] executable ) :
Code:
grep -F 'MH76M*M' f1

grep -F 'MH76M*ME' f1

Feherke.
 
Thanks for the help. I figured out after I posted why the first one didn't work, as you explained. But not why the other pairs yielded different results when I escaped the asterisk. For whatever reason my version of UNIX does not interpret those as I would expect. There IS a literal asterisk in the string in the file, but still, with

grep 'MH76M\*M' f1
grep 'MH76M\*ME' f1

only the first one works.

However, the -F option does yield the proper result - both of the following yield the string:

grep -F 'MH76M\*M' f1
grep -F 'MH76M\*ME' f1

I'm still baffled as to why grep 'MH76M\*ME' f1 does not work, but at least I can get the desired result now with the -F option. Thanks again.
 
Annihilannic -

uname -a returns:
HP-UX cl00hp99 B.11.23 U 9000/800 1012723164 unlimited-user license

echo $0 gives me:
-sh

Although I entered ksh to go to a korn shell and I got the same results with the grep commands above.

Thanks.


 
Something strange going on there then:

Code:
$ uname -a
HP-UX example B.11.23 U 9000/800 3238633046 unlimited-user license
$ echo $0
ksh
$ cat f1
MH76M*ME5G4GCKOVICHJEFFREY0187405......
$ grep 'MH76M\*M' f1
MH76M*ME5G4GCKOVICHJEFFREY0187405......
$ grep 'MH76M\*ME' f1
MH76M*ME5G4GCKOVICHJEFFREY0187405......
$

Perhaps you have some hidden characters in your file, try checking with cat -vet f1?

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top