Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

matching two big files 2

Status
Not open for further replies.

crasho2001

Technical User
Jun 13, 2002
51
0
0
TR
Hi,

I have two big files.

File1 has 2 columns. (id,num). It has all ids.
File2 has 1 column.(id). It has some ids.

If File2 id matches File1 id than count num.

I used the script below. But it takes too much time. How can make it faster?

count=0
for list in `cat File2`
do
count1=`grep $list File1|awk '{print $2}'`
count=`expr $count + $count1`
done
echo $count
 
Hi

Or possibly faster off-topic solutions for huge files :
Code:
[gray]# if the second column certainly not contains the id[/gray]
grep -f file2 -wc file1

[gray]# otherwise[/gray]
sed 's/^/^/' file2 | grep -f - -wc file1

Feherke.
 
one way to go (provided delimiter characters are compatible with join and both files are pre-sorted) is with join and awk.

It is possible to code it all in awk, but I need more time (which I don't have right now) to find an all-awk solution so here goes:

Code:
join File1 File2|awk '{sum=sum+$2}END{print sum}'



HTH,

p5wizard
 

Feherke,
Awk does not work. The result is 0. And which grep is it?

>cat l1
1 1
2 2
3 3
4 4
5 5
6 7

>cat l2
1
3
5
6
7

 
Hi

crasho2001 said:
Awk does not work. The result is 0.
[tt]awk[/tt] does work. The result is not 0.
Code:
[blue]master #[/blue] cat file1
1 1
2 2
3 3
4 4
5 5
6 7

[blue]master #[/blue] cat file2
1
3
5
6
7

[blue]master #[/blue] gawk 'FNR==NR{k[$1]=1;next}{c+=k[$1]}END{print c}' file2 file1
4

[blue]master #[/blue] mawk 'FNR==NR{k[$1]=1;next}{c+=k[$1]}END{print c}' file2 file1
4

[blue]master #[/blue] awk95 'FNR==NR{k[$1]=1;next}{c+=k[$1]}END{print c}' file2 file1
4
crasho2001 said:
And which grep is it?
It is GNU [tt]grep[/tt].

Anyway, forget it. Your variable names mislead me.

Feherke.
 
Feherke,

crasho2001 said:
If File2 id matches File1 id than count num

would have been better stated sth like this:
me said:
make a sum of all nums (File1, field2) of the ids which occur in both files.

I gathered as much by examining crasho2001's shell script posted in OP.


HTH,

p5wizard
 
Hi

Yes p5wizard, I definitely prefer your wording. Or at least crasho2001's sample code with less obfuscated variable names. Well, this kind of things happen sometimes. I have a feeling that I am not the only non-English here.

By the way, I like your [tt]join[/tt] trick.

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top