how to pass parameters into regular expression, i.e., /parameter/

will27 · Sep 18, 2007

Hi, all
I have been trying to find a way to pass parameters into regular expression, failed, so I am back.
for example, I have following two files:

# firstfile
USA US United_States_of_America
# secondfile
x company USA
US y Company
xyz United_States_of_America

what I want is to pass every key in the firstfile to regular expression, and then loop through the second file.
like this:

## code below ###
for x in firstfile
do
awk -v reg=$x '/reg/' secondfile
done

However, this won't work, can anyone suggest how to solve this?

Thanks
will

p5wizard · Sep 18, 2007

what you are running is one iteration of

awk -v reg=firstfile '/reg/' secondfile

You need to loop through the [green]contents[/green] of firstfile, not the [red]name[/red] of firstfile, and then some: I'm assuming you need the second fields out of firstfile to look for records in secondfile...

In your for-do-done construct this becomes:

for x in $(awk '{print $2}' firstfile)
do
awk -v reg=$x '/reg/' secondfile
done

HTH,

p5wizard

futurelet · Sep 18, 2007

Code:

awk 'NR==FNR{gsub(/[ \t]+/,"|");r=$0;next}$0 ~ r' file1 file2

futurelet · Sep 18, 2007

Here's a more understandable version.

Code:

# Are we reading 1st file?
FILENAME==ARGV[1] { 
  #Convert whitespace in $0 to "|".
  gsub( /[ \t]+/, "|" )
  # Save our regular expression.
  r = $0
  # Read next line.
  next }
# Print line if it matches the string that
# is being used as a regular expression.
$0 ~ r

futurelet · Sep 18, 2007

Code:

awk 'NR==FNR{$1=$1;r=$0;next}$0 ~ r' OFS='|' file1 file2

will27 · Sep 19, 2007

Hi, all
thanks for the answer.
first I am sorry the code is kind of confusing. what I really intend to do at the first line of the script is to read the first file. p5wizard, thanks for the answer, I did the same, it won't work.

futruelet's code works great.thank you!

I would appreciate it if you can give me a brief explnation on the usage of "$0~r" and the magical effect of "|", or point me to a reference.

Regards
will

p5wizard · Sep 19, 2007

Apparently you can't use a variable inside the /xxx/ construct.

How about this:

for x in $(<firstfile)
do
awk -v reg=$x '{ if ($0 ~ reg) print}' secondfile
echo
done
x company USA

x company USA
US y Company

xyz United_States_of_America

Of course when looking for US, USA also comes out so I'd stick to futurelet's code

Some explanations:
$(<firstfile) is shortcut for $(cat firstfile)
that ~ operator can be seen as the 'LIKE' operator

In futurelet's solution, the "magical" | character is an OR between the LIKE possibilities.

read it like so (after substituting the value of the r variable) and writing out the default behaviour (i.e. print out tha records that match the expression):

if ($0 ~ 'US|USA|United_States_of_America')
print

In other words: if a record contains [red]either[/red] US [red]or[/red] USA [red]or[/red] United_States_of_America, then print it.

HTH,

p5wizard

futurelet · Sep 19, 2007

Using ~ is one way to see if a reg.ex. matches a string.

Code:

BEGIN{ print "foo bar" ~ /o.b/ }

If you have an explicit reg.ex. like /.../, matching
against $0 is automatic:

Code:

BEGIN{ $0 = "foo bar"; print /o.b/ }

But if the reg.ex. is disguised as a string, we have to let awk know that we want it used as a reg.ex. Using ~ is one way to do that.

Code:

BEGIN{ $0 = "foo bar"; r = "o.b"; print $0 ~ r }

The symbol | is one of the magic characters in regular expressions. It means "or". So the following program will print the line if it has "good" or "better".

Code:

awk '/good|better/' myfile

I hope this was helpful to you.

will27 · Sep 19, 2007

thanks for the follow-up clarifications, and it is very helpful.

one more question to futurelet, hope this will not bother you.

I tried to extend your example to array, and apply them to follwoing two dataset(they are only a very small excerpts from the original dataset):

get some interesting problems:

### file 1: testkey (tab delimited, 3 fields each row)

Arr-Maz N/A 3
11
Alabama:River:Chip N/A 3
sonus:networks N/A 1
Henderson:Land:Finance

Cayman N/A 1
equity:marketing N/A 1
Beauty:Labs LABB 1
Charles:Schwab SCH 12
Synleaseco

elaware:Business N/A 1
zaring:national N/A 3

### file 2: testpool (tab delimited, 3 fields each row)

EZENIA:INC 20030829 EZEN
ZARING:NATIONAL:CORP 20001229 ZHOM
BIOSOURCE:INTERNATIONAL:INC 20031231 BIOI
ELECTRONIC:MAIL:CORP:AMER 19880831 EMCA
20030930
SONUS:NETWORKS:INC 20031231 SONS
AVTEL:COMMUNICATIONS:INC 19990831 AVCO
ZIEGLER:COMPANY:INC 20031231 ZCO
EQUITY:MARKETING:INC 20031231 EMAK
P:M:R:CORP 20020830 PMRP

What I wanna do is to

1) use the first field in "testkey" as the key,
2) if the first field of "testpool"s contains any of the key, then
3) print out the whole row corresponding to the key in the testkey file and the row in the second file match the key.

for example,

one key is "sonus:networks", this key find a match "SONUS:NETWORKS:INC" in the testpool file, then the output will be:

sonus:networks N/A 1 SONUS:NETWORKS:INC 20031231 SONS

# code 1: this code works fine on this sample, but terriably slow and seems enter a dead loop on the parent dataset

awk -F"\t" 'NR==FNR{ara[$1]=$0;next} {for(s in ara) {if (index(toupper($1),toupper(s))>0) print $0, ara ~~}}' OFS="\t" testkey testpool~~

# code 2: this one simply not work, error occur

awk -F"\t" 'NR==FNR{ara[$1]=$0;next} {for(s in ara) {if (toupper($1) ~ toupper(s)) print $0, ara ~~}}' OFS="\t" testkey testpool~~

# the error message is as follwoing:
awk: cmd. line:1: (FILENAME=testpool FNR=1) fatal: Unmatched ( or \ /HENDERSON:LAND:FINANCECAYMAN/

it seems to me awk, in this case, get problem with the bracket.

Any suggestion will be appreciated!

thanks
will

futurelet · Sep 19, 2007

Code:

FILENAME == ARGV[1] {
  keys[ toupper($1) ] = $0
  next
}

{ for (key in keys)
  { if ( index( toupper($1), key ) )
      print keys[key], $0
  }
}

will27 · Sep 19, 2007

thanks, futurelet
the code works, the biggest problem turns out that the key field in keyfile contains "", this is disastrous, "" alway get a match.

will

futurelet · Sep 20, 2007

Code:

FILENAME == ARGV[1] {
  if ( $1 != "" )
    keys[ toupper($1) ] = $0
  next
}

{ for (key in keys)
  { if ( index( toupper($1), key ) )
      print keys[key], $0
  }
}

Is this better?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

how to pass parameters into regular expression, i.e., /parameter/

will27

Technical User

p5wizard

IS-IT--Management

futurelet

Programmer

futurelet

Programmer

futurelet

Programmer

will27

Technical User

p5wizard

IS-IT--Management

futurelet

Programmer

will27

Technical User

futurelet

Programmer

will27

Technical User

futurelet

Programmer

Similar threads

Part and Inventory Search

Sponsor