Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reusing regular expression groups with awk

General programming in Awk

Reusing regular expression groups with awk

by  xmb  Posted    (Edited  )
Regular expression groups is stuff matched in brackets, eg /some([^:]+):here:(.*)/

awk, in general, cannot use or reuse those. They are normally processed as a match.

The only way to accomplish this (besides alternative techniques), is using gawk's gensub() and match() function

match() is extended by an optional third argument, a destination array in which the groups will get stored
gensub() is kindof an advanced g/sub() returning the new string. The group is specified by \\<number> in its second argument, \\0 represents the whole match, as does &. Its third argument indicates which match to actually replace, or "g" for all. Note on where .* is wanted/needed.

Note, awk does not support look-(ahead|behind), or any (<modifier><regex>) classes as PCRE or other advanced (standarized partly) libraries do, not even {<number>} counted matches, excepts gawk with --re-interval

# btw, my prompt looks like this: xmb ([color red]gp[/color]:[color yellow]4[/color]:[color green]3[/color])~/awk $
# PS1='\u (\[\e[1;31m\]\h\[\e[m\]:\[\e[33m\]\l\[\e[m\]:\[\e[32m\]\j\[\e[m\])\[\e[1m\]\w\[\e[m\] \$ '
Examples:

Code:
$ echo 'From: "Sumone @home" <home@me.com>
From: malformed <doh>' |
    gawk '{ print gensub(/.*:[ "]+([^"]+| *<).*<([^>]+).*/, "\\1 -+- \\2", 1) }'
[b]Sumone @home -+- home@me.com
malformed  -+- doh[/b]

Code:
$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", 2) }'
[b]XabF cd XdeF[/b]

$ echo XabF XcdF XdeF | gawk '{ print gensub(/X([^X]+)F/, "\\1", "g") }'
[b]ab cd de[/b]

Code:
$ echo "<html><head><blah foo=bar>yeah<..>" | gawk '{
    match($0, /head><([^ ]+) ([^=]+)=([^>]+)>([^<]+)/, Arr)
    printf "%s (%s->%s) == %s\n", Arr[1], Arr[2], Arr[3], Arr[4]
    }'
[b]blah (foo->bar) == yeah[/b]

; [ng tag]
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top