Compare two files

tonivm · Dec 17, 2007

Hi everybody:

I have this new problem.
I have two files which each one are separated with tabulations. The first have 5 columns where first is date, second hour and third is minute. In the other one, there are 13 columns and the three firsts columns are like the other one. My question is, how could I merge two file when the fisrts three columns are the same????

Thanks in advance.

Annihilannic · Dec 17, 2007

What have you tried so far?

Try reading the first file into an array indexed by the first three fields. Then when you process the second file you can output the array contents followed by the line from the second file.

Annihilannic.

tonivm · Dec 18, 2007

hi:

how can I do exactly?.
If file1 is:

20030601 14 05 498.985 0.436532
20030601 14 10 499.531 0.260819
20030601 14 15 499.427 0.508597
20030601 14 20 500.418 0.606761
20030601 14 25 496.877 0.233742
20030601 14 30 496.578 0.117021
20030601 14 35 497.189 0.487961
20030601 14 40 499.676 0.613496
..
..

ans file2 is :
20030601 00 00 161.01529 -2.324 -2.825 18.78 70.50 998.94 0.00 0.61 97.20 64.48
20030601 00 05 160.89055 -2.126 -2.516 18.81 70.70 998.89 0.00 1.08 119.70 54.91
20030601 00 10 160.71014 -2.202 -2.529 18.76 71.00 998.81 0.00 0.49 129.90 42.51
20030601 00 15 160.47561 -2.029 -2.456 18.77 70.90 998.72 0.00 0.32 103.90 40.59
20030601 00 20 160.18892 -2.184 -2.474 18.71 71.30 998.70 0.00 0.72 117.40 72.90
20030601 00 25 159.85234 -2.221 -2.533 18.61 71.60 998.60 0.00 0.57 143.20 71.40
20030601 00 30 159.46837 -2.228 -2.552 18.46 72.50 998.49 0.00 0.79 144.70 50.04
...
...

thanks in advance

vgersh99 · Dec 18, 2007

tonivm,
it seems that you have a propensity of not giving direct answers when asked about your own implementation effort. At least in this forum.
Pls try your do your own investigation and try your own 'hand' first. It looks like you've been given quite a number of solutions recently and should serve you as a good 'starting point' as good starting point.
There were a number similar threads in this forum - at least try the 'Search' with the keyword 'join' first.
If and when you get stuck, pls DO come back with the detailed implementation questions.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

tonivm · Dec 19, 2007

hi:
I tried two options but both did not work correctly.
First I tried this:

Code:

join -j 1 -j 2 -j 3 -o 1.1,1.2,1.3,1.4,2.9 a.txt b.txt > final.txt

but I have an error message which said that fields 1 and 2 are not compatibles.

And on the other hand I tried this awk script:

Code:

awk '{
if(NR==FNR)
	f[$3]=sprintf("%s %s %s %3.3f %1.3f",$1,$2,$3,$4,$5)
if(NR==FNR)
{
	if (f[$3]!="") 
		n[$1]=sprintf("%s %2.2f %3.2f %2.3f",f[$3],$7,$8,($8/100)*(6.1121*exp((17.502*$7)/($7+240.97))))
}
}
END{
for (i in n)
print n[i]
}'

but the outputs seems not correct because is very strange that in two files where there are more than 100,000 lines only there are 12 common lines.

Thanks in advance and sorry if I am not enough direct in my questions.

vgersh99 · Dec 19, 2007

It's not that "you are not direct with the questions" - you're not showing your OWN effort along the way.

It's somewhat disconcerting seeing an OP leaching solutions from multiple sites and posting someone else's solution as his/her own. At least have the courtesy of quoting someone else's work appropriately.

Now.... You provided 2 files, but both files have no 'common keys'. Secondly, you didn't provide the definition of your 'merge' - what the final result/file should look like. This leaves a lot to the interpretation.

Despite your I*net scavenging hunt, I'll provide one interpretation of the task and let you work with it.

nawk -f toni.awk file1 file2

toni.awk:

Code:

BEGIN {
   FS=OFS=sprintf("\t")
}
{ idx = $1 FS $2 FS $3 }

NR==FNR { arr[idx] = $4 FS $5; next }
idx in arr {
    printf("%s%s%s", idx, OFS, arr[idx])
    for(i=4; i <= NF; i++)
       printf("%s%s", $i, (i==NF) ? "\n" : OFS)
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

tonivm · Dec 19, 2007

Well:
I have these two files

file 1

20030601 14 05 498.985 0.436532
20030601 14 10 499.531 0.260819
20030601 14 15 499.427 0.508597
20030601 14 20 500.418 0.606761
20030601 14 25 496.877 0.233742
20030601 14 30 496.578 0.117021
20030601 14 35 497.189 0.487961
20030601 14 40 499.676 0.613496
..
..

and file 2
20030601 14 05 71.96452 810 309.2 30.09 43.11 996.26 0.00 4.04 125.70 55.32
20030601 14 10 72.46647 802 307.3 29.98 43.02 996.27 0.00 3.74 117.70 59.46
20030601 14 15 72.98309 789 304.3 29.99 43.65 996.28 0.00 2.79 116.50 72.80
20030601 14 20 73.51406 780 303.2 30.27 42.71 996.24 0.00 3.31 103.00 67.17
20030601 14 25 74.05905 764 300 29.66 43.53 996.22 0.00 3.21 94.10 68.35
20030601 14 30 74.61774 754 299.8 29.73 42.94 996.18 0.00 4.05 132.40 57.06
20030601 14 35 75.18980 744 298.6 29.86 42.81 996.16 0.00 3.25 116.30 71.00
20030601 14 40 75.77492 746 306.8 30.30 41.84 996.16 0.00 2.46 111.50 61.00
...
..

now there are some common lines, then i would a final file like:

20030601 14 05 498.985 0.436532 996.26
20030601 14 10 499.531 0.260819 996.27
20030601 14 15 499.427 0.508597 996.28
20030601 14 20 500.418 0.606761 996.24
20030601 14 25 496.877 0.233742 996.22
20030601 14 30 496.578 0.117021 996.18
20030601 14 35 497.189 0.487961 996.16
20030601 14 40 499.676 0.613496 996.16
..
..

thanks in advance

vgersh99 · Dec 19, 2007

Code:

BEGIN {
   FS=OFS=sprintf("\t")
}
{ idx = $1 FS $2 FS $3 }

NR==FNR { arr[idx] = $4 FS $5; next }
idx in arr {
    printf("%s%s%s%s%s\n", idx, OFS, arr[idx], OFS, $9)
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Compare two files

tonivm

Technical User

Annihilannic

MIS

tonivm

Technical User

vgersh99

Programmer

tonivm

Technical User

vgersh99

Programmer

tonivm

Technical User

vgersh99

Programmer

Similar threads

Part and Inventory Search

Sponsor