Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how to pass parameters into regular expression, i.e., /parameter/

Status
Not open for further replies.

will27

Technical User
Jun 13, 2007
23
US
Hi, all
I have been trying to find a way to pass parameters into regular expression, failed, so I am back.
for example, I have following two files:

# firstfile
USA US United_States_of_America
# secondfile
x company USA
US y Company
xyz United_States_of_America

what I want is to pass every key in the firstfile to regular expression, and then loop through the second file.
like this:

## code below ###
for x in firstfile
do
awk -v reg=$x '/reg/' secondfile
done


However, this won't work, can anyone suggest how to solve this?

Thanks
will





 
what you are running is one iteration of

awk -v reg=firstfile '/reg/' secondfile

You need to loop through the [green]contents[/green] of firstfile, not the [red]name[/red] of firstfile, and then some: I'm assuming you need the second fields out of firstfile to look for records in secondfile...

In your for-do-done construct this becomes:

for x in $(awk '{print $2}' firstfile)
do
awk -v reg=$x '/reg/' secondfile
done



HTH,

p5wizard
 
Code:
awk 'NR==FNR{gsub(/[ \t]+/,"|");r=$0;next}$0 ~ r' file1 file2
 
Here's a more understandable version.
Code:
# Are we reading 1st file?
FILENAME==ARGV[1] { 
  #Convert whitespace in $0 to "|".
  gsub( /[ \t]+/, "|" )
  # Save our regular expression.
  r = $0
  # Read next line.
  next }
# Print line if it matches the string that
# is being used as a regular expression.
$0 ~ r
 
Hi, all
thanks for the answer.
first I am sorry the code is kind of confusing. what I really intend to do at the first line of the script is to read the first file. p5wizard, thanks for the answer, I did the same, it won't work.

futruelet's code works great.thank you!

I would appreciate it if you can give me a brief explnation on the usage of "$0~r" and the magical effect of "|", or point me to a reference.

Regards
will
 
Apparently you can't use a variable inside the /xxx/ construct.

How about this:

for x in $(<firstfile)
do
awk -v reg=$x '{ if ($0 ~ reg) print}' secondfile
echo
done
x company USA

x company USA
US y Company

xyz United_States_of_America

Of course when looking for US, USA also comes out so I'd stick to futurelet's code

Some explanations:
$(<firstfile) is shortcut for $(cat firstfile)
that ~ operator can be seen as the 'LIKE' operator

In futurelet's solution, the "magical" | character is an OR between the LIKE possibilities.

read it like so (after substituting the value of the r variable) and writing out the default behaviour (i.e. print out tha records that match the expression):

if ($0 ~ 'US|USA|United_States_of_America')
print

In other words: if a record contains [red]either[/red] US [red]or[/red] USA [red]or[/red] United_States_of_America, then print it.


HTH,

p5wizard
 
Using ~ is one way to see if a reg.ex. matches a string.
Code:
BEGIN{ print "foo bar" ~ /o.b/ }
If you have an explicit reg.ex. like /.../, matching
against $0 is automatic:
Code:
BEGIN{ $0 = "foo bar"; print /o.b/ }
But if the reg.ex. is disguised as a string, we have to let awk know that we want it used as a reg.ex. Using ~ is one way to do that.
Code:
BEGIN{ $0 = "foo bar"; r = "o.b"; print $0 ~ r }
The symbol | is one of the magic characters in regular expressions. It means "or". So the following program will print the line if it has "good" or "better".
Code:
awk '/good|better/' myfile
I hope this was helpful to you.
 
thanks for the follow-up clarifications, and it is very helpful.

one more question to futurelet, hope this will not bother you.

I tried to extend your example to array, and apply them to follwoing two dataset(they are only a very small excerpts from the original dataset):

get some interesting problems:

### file 1: testkey (tab delimited, 3 fields each row)


Arr-Maz N/A 3
11
Alabama:River:Chip N/A 3
sonus:networks N/A 1
Henderson:Land:Finance:(Cayman N/A 1
equity:marketing N/A 1
Beauty:Labs LABB 1
Charles:Schwab SCH 12
Synleaseco:Delaware:Business N/A 1
zaring:national N/A 3

### file 2: testpool (tab delimited, 3 fields each row)

EZENIA:INC 20030829 EZEN
ZARING:NATIONAL:CORP 20001229 ZHOM
BIOSOURCE:INTERNATIONAL:INC 20031231 BIOI
ELECTRONIC:MAIL:CORP:AMER 19880831 EMCA
20030930
SONUS:NETWORKS:INC 20031231 SONS
AVTEL:COMMUNICATIONS:INC 19990831 AVCO
ZIEGLER:COMPANY:INC 20031231 ZCO
EQUITY:MARKETING:INC 20031231 EMAK
P:M:R:CORP 20020830 PMRP


What I wanna do is to

1) use the first field in "testkey" as the key,
2) if the first field of "testpool"s contains any of the key, then
3) print out the whole row corresponding to the key in the testkey file and the row in the second file match the key.

for example,

one key is "sonus:networks", this key find a match "SONUS:NETWORKS:INC" in the testpool file, then the output will be:

sonus:networks N/A 1 SONUS:NETWORKS:INC 20031231 SONS


# code 1: this code works fine on this sample, but terriably slow and seems enter a dead loop on the parent dataset

awk -F"\t" 'NR==FNR{ara[$1]=$0;next} {for(s in ara) {if (index(toupper($1),toupper(s))>0) print $0, ara }}' OFS="\t" testkey testpool


# code 2: this one simply not work, error occur

awk -F"\t" 'NR==FNR{ara[$1]=$0;next} {for(s in ara) {if (toupper($1) ~ toupper(s)) print $0, ara }}' OFS="\t" testkey testpool

# the error message is as follwoing:
awk: cmd. line:1: (FILENAME=testpool FNR=1) fatal: Unmatched ( or \:) /HENDERSON:LAND:FINANCE:(CAYMAN/

it seems to me awk, in this case, get problem with the bracket.

Any suggestion will be appreciated!

thanks
will




 
Code:
FILENAME == ARGV[1] {
  keys[ toupper($1) ] = $0
  next
}

{ for (key in keys)
  { if ( index( toupper($1), key ) )
      print keys[key], $0
  }
}
 
thanks, futurelet
the code works, the biggest problem turns out that the key field in keyfile contains "", this is disastrous, "" alway get a match.

will
 
Code:
FILENAME == ARGV[1] {
  if ( $1 != "" )
    keys[ toupper($1) ] = $0
  next
}

{ for (key in keys)
  { if ( index( toupper($1), key ) )
      print keys[key], $0
  }
}
Is this better?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top