Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regex question in awk/sed

Status
Not open for further replies.

olded

Programmer
Oct 27, 1998
1,065
US
Hi:

I had need in an awk script (solaris 7) to determine if a field was a social security number, SSN. In Dale Douugherty's book Sed & Awk, he describes a sed regular expression for SSN. The following sed stub substitutes correctly if the data echoed to sed is 3 numerics followed by dash, 2 numerics followed by dash, and 4 numerics:

echo "111-11-1111"| sed 's/^[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}$/222-22-2222/g'

I tried the same regex with awk/nawk:

cnt=`echo "111-11-1111" |awk ' {
if($1 ~ /^[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}$/)
print 1
else print 0 } '`
echo $cnt # 0 if not SSN and 1 if is

And it FAILS. I don't get an error, but neither does it work.

I tried the same regex with awk not using the meta characters \) and \(, and it works:

cnt=`echo "111-11-1111" |awk ' {
if($1 ~ /^[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]$/ )
print 1
else print 0 } '`
echo $cnt # 0 if not SSN and 1 if is

Does anyone know why \( and \) do not work? Am I missing something?


Regards,


Ed
 
I assume you meant '\{' and '\}' coupling in your first examples as there is no '\(' '\)' coupling.

To quote Dougherty [p.45]:
"Escape curly braces, \{ and \}, are available in grep and sed, but not awk and egrep. They enclose one or two arguments."

These metachars are not available in awk/nawk. You might try ANSI awk and gawk, but I doubt it.

vlad
 
It needs to be posix compliant for the {num matches}
syntax to produce anything.
Traditional awk did not support this, nor did sed.
See A.Robbins "Effective Awk Programming", and the sed faq
for notes on this.
 
Ed -

My version of awk on HP 10.20 supports extended regular expressions so intervals go from RE\{m,n\} to RE{m,n}. And the line
if($1 ~ /^[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}$/)

should become
if($1 ~ /^[0-9]{3}-[0-9]{2}-[0-9]{4}$/)

I hope this helps. Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Vlad and all:

Thanks for the replies. I guess I should read the sed part of the sed & awk book.

Regards,


Ed
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top