match&replace from one file to another (huge files)

shiana · Jan 25, 2005

I have this problem:
I have a file with two columns:

c_223

net1:223
c_456:n nnneint_i:456

The second file has 4 columns, like:

Cc_3 c_223

c_456:n 6.3348274
Cc_4 c_56:n c_223

7.32423

etc.
I need to read from file 1 first token,
search in file 2 all occurences of that token and
replace each of them with token 2 form file 1.

the result would be:

Cc_3 net1:223 nnneint_i:456 6.3348274
Cc_4 c_56:n net1:223 7.32423

The problem is that for each line in file 1 I must parse the whole file 1 (I guess)
The files are VERY large:
file1 126680K - 1799405 lines
file2 239678K - 5324567 lines

Please HELP! It is urgent!

Thank you!
Elena

PHV · Jan 25, 2005

A starting point:
nawk '
NR=FNR{a[$1]=$2;next}
{if($2 in a)$2=a[$2];print
' file1 file2 > newfile2

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

vgersh99 · Jan 25, 2005

nawk -f shiana.awk file1 file2

here's shiana.awk

Code:

FNR == NR {
  arr[$1]=$2
  next;
}
{
   for(i=1;i<=NF;i++)
      if( $i in arr)
        $i=arr[$i];
   print;
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

PHV · Jan 25, 2005

Misread original post
search in file 2 all occurences
nawk '
NR=FNR{a[$1]=$2;next}
{for(i=1;i<=NF;++i)if($i in a)$i=a[$i];print
' file1 file2 > newfile2

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

futurelet · Jan 25, 2005

Vlad, I am saddened to see that you have been dabbling in C or Perl again and that you have consequently afflicted your Awk code with cancer of the semicolon.

vgersh99 · Jan 25, 2005

well.... at least I'm being consistently inconsistent [wink]

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

vgersh99 · Jan 25, 2005

another way might be [for starters - will need to improve regex]:

Code:

sed -e 's#^\([^ ][^ ]*\) \([^ ][^ ]*\)$#s/\1/\2/g#g' file1.txt  > /tmp/file1.sed

sed -f /tmp/file1.sed file2

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

shiana · Jan 25, 2005

Hi again, everyone,
I used the nawk, and it worked perfectly. It did what I wanted.
Thanks to all,
especially PHV!
Elena

shiana · Jan 26, 2005

Hello again,
I have one more problem. I must make the same replacements in another file that contains also lines that look like:
*|I (c_562:n c_656 n B 0.0)

and the replacements within are not performed. (in the other lines the tokens are replaced ok)
Can you tell me why? does it have something to do with the brackets '(' ')' or the'*' character at the beginning?

Thanks again!
Elena

PHV · Jan 26, 2005

nawk '
NR=FNR{sub(/^.*(/,"");a[$1]=$2;next}
{for(i=1;i<=NF;++i)if($i in a)$i=a[$i];print
' file1 file2 > newfile2

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

match&replace from one file to another (huge files)

shiana

Programmer

PHV

MIS

vgersh99

Programmer

PHV

MIS

futurelet

Programmer

vgersh99

Programmer

vgersh99

Programmer

shiana

Programmer

shiana

Programmer

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor