Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to read ids from 1 file and grab those ids from 2nd file 1

Status
Not open for further replies.

rsteffler

Programmer
Aug 2, 2001
10
US
I am trying to write a compact AWK script to read a file that contains a list of id's like this:

444
234
236

I have a second file that is pipe delimited where the 6th record is this id like this:

name|address1|address2|hphone|wphone|444|

I want to print out to the screen any record in the second file whose id is in the first file. It's essentially automating (and hopefully streamlining) the process similar to: grep 444 file2 and then making sure its only 444 and its in the 6th field.

Is there a simple AWK Script that do this? Please keep in mind that my second file, the file to be searched has 8 million records, so greps are slow.

Thanks,
Robert
 
BEGIN {
FS="|"
CONF_FILE="searchFile1.txt"
fld2search="6"

while (getline < CONF_FILE > 0) {
# skip comments in the config file
if ( $0 ~ /^[#].*/ ) continue;
arrConfs[$0];
}
close(CONF_FILE);
}

$fld2search in arrConfs { print }
 
Any solution is going to take some time to process 8 million records, but hopefully this awk program won't take too long.
Code:
BEGIN {
  FS = &quot;|&quot;
  while ((getline < &quot;-&quot;) > 0) a[++ix] = $0
}
{
  for (i=1;i<=ix;i++) {
    if ($6 == a[i]) {print; next}
  }
}
Run it by entering:
Code:
awk -f awk-script big-file < id-file
Hope this helps. CaKiwi
 
I believe using the &quot;in&quot; construct will be faster in discriminating records - one lookup instead of the &quot;iterative&quot; lookup.

vlad
 
Vlad,

You beat me to it with a better solution. I agree the &quot;in&quot; construct is probably faster. CaKiwi
 
Since I'm a sed-o-holic and a speed freak this is about 3 times faster then the awk example with 'in' (which is about 3 times faster than '==', on my machine). The caveat is that sed takes only 100 commands so the config file may only supply 98 items.

#! /bin/sh
eval `sed '
1s/.*/sed -e &/
:loop
N
s/\n.*/ -e &/
s/\n//
$!b loop
' $1 |
sed &quot;
s/[0-9][0-9]*/'\/&\/b'/g
s/$/ -e d $2/
&quot;`

script id-file big-file

Cheers,
ND [smile]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top