Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

match&replace from one file to another (huge files)

Status
Not open for further replies.

shiana

Programmer
Jan 25, 2005
3
RO
I have this problem:
I have a file with two columns:

c_223:p net1:223
c_456:n nnneint_i:456

The second file has 4 columns, like:

Cc_3 c_223:p c_456:n 6.3348274
Cc_4 c_56:n c_223:p 7.32423

etc.
I need to read from file 1 first token,
search in file 2 all occurences of that token and
replace each of them with token 2 form file 1.

the result would be:


Cc_3 net1:223 nnneint_i:456 6.3348274
Cc_4 c_56:n net1:223 7.32423

The problem is that for each line in file 1 I must parse the whole file 1 (I guess)
The files are VERY large:
file1 126680K - 1799405 lines
file2 239678K - 5324567 lines

Please HELP! It is urgent!

Thank you!
Elena
 
A starting point:
nawk '
NR=FNR{a[$1]=$2;next}
{if($2 in a)$2=a[$2];print
' file1 file2 > newfile2

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
nawk -f shiana.awk file1 file2

here's shiana.awk
Code:
FNR == NR {
  arr[$1]=$2
  next;
}
{
   for(i=1;i<=NF;i++)
      if( $i in arr)
        $i=arr[$i];
   print;
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Misread original post
search in file 2 all occurences
nawk '
NR=FNR{a[$1]=$2;next}
{for(i=1;i<=NF;++i)if($i in a)$i=a[$i];print
' file1 file2 > newfile2

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Vlad, I am saddened to see that you have been dabbling in C or Perl again and that you have consequently afflicted your Awk code with cancer of the semicolon.
 
well.... at least I'm being consistently inconsistent [wink]

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
another way might be [for starters - will need to improve regex]:

Code:
sed -e 's#^\([^ ][^ ]*\) \([^ ][^ ]*\)$#s/\1/\2/g#g' file1.txt  > /tmp/file1.sed

sed -f /tmp/file1.sed file2

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi again, everyone,
I used the nawk, and it worked perfectly. It did what I wanted.
Thanks to all,
especially PHV!
Elena
 
Hello again,
I have one more problem. I must make the same replacements in another file that contains also lines that look like:
*|I (c_562:n c_656 n B 0.0)

and the replacements within are not performed. (in the other lines the tokens are replaced ok)
Can you tell me why? does it have something to do with the brackets '(' ')' or the'*' character at the beginning?

Thanks again!
Elena
 
nawk '
NR=FNR{sub(/^.*(/,"");a[$1]=$2;next}
{for(i=1;i<=NF;++i)if($i in a)$i=a[$i];print
' file1 file2 > newfile2

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top