Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with AWK

Status
Not open for further replies.

linuxMaestro

Instructor
Jan 12, 2004
183
US
# awk '/([a-Z][0-9])*\@([a-Z][0-9])\.[a-z][a-z][a-z]?/' filename
awk: cmd. line:1: fatal: Invalid range end: /[a-Z]*\@[a-Z]\.[a-z][a-z][a-z]?/

I am trying to get email addresses from file filename, what am I doing wrong?

I have
([a-Z][0-9])* Match any number of letters or numbers
\@ Before an @
([a-Z][0-9])* then Match any number of letters or numbers
\. Before a .
[a-z][a-z][a-z]? then match either 2 or 3 characters

Why won't it work?
 
> Invalid range end

I'd look at those [tt][a-Z][/tt] ranges and their invalid ends.

The end is supposed to come after the beginning. In ASCII, the capital letters come before the lowercase letters. So capital [tt]Z[/tt] comes before lowercase [tt]A[/tt].

However, [tt][A-z][/tt] is not right, either. Several characters ([tt][[/tt], [tt]\[/tt], [tt]][/tt], [tt]^[/tt], [tt]_[/tt], and [tt]`[/tt]) come between [tt]Z[/tt] and [tt]a[/tt]. You should specify them as separate ranges: [tt][a-z][/tt] and [tt][A-Z][/tt].

Even then, I still think there's something wrong with your idea of a valid e-mail address... I think there can be multiple dots in the part before the [tt]@[/tt], for one thing... I'd look at an RFC decribing valid e-mail addresses if I were you and I wanted this to work on any a-mail address.
 
Try this one:

# Try to match email addresses.
BEGIN { a="[a-zA-Z0-9]"; A="[a-zA-Z]"
e_pat = "(^|[ \t])("a"+[.])*"a"+@"A A A"?([^a-zA-Z]|$)"
}
$0 ~ e_pat
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top