Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

comparing fields from one file with another

Status
Not open for further replies.

ashee

Programmer
Apr 2, 2002
4
GB
Hello

I write AIX korn shell scripts with awk (no nawk/gawk) and am trying to further my understanding of awk arrays as awk is so good at data maniplulation. How might I deal with the problem below using awk?

I have two files, proposed and actual

The files are timings with associated results for a group of students

how do i compare the two files?
actual file has complete time data and complete results.
proposed file may have gaps in the timing data.

i need to look at the timing( fields 2 3 and 4 ) for each student in the actual file

and

if the timing fields are in the proposed file, i need to copy proposed line out.
if the timing fields are not in the proposed file, i need to copy actual line out.


STUDENT|JOHN|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|100|100|100|10.2|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|15|15|18.23|10.23|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|144|144|18.22|10.22|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.21|8.21|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-55|-55|18.20|7.85|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.19|7.85|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|475|475|31.70|10.32|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|15|15|18.31|10.31|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|73|73|18.30|10.30|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.29|8.29|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-127|-127|18.28|7.93|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.27|7.93|
STUDENT|MARY|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|485|485|31.66|10.49|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|1|1|18.48|10.48|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|1|1|18.47|10.47|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.46|8.46|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-1|-1|18.45|7.96|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.44|7.46|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|485|485|31.66|10.57|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|1|1|18.56|10.56|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|1|1|18.55|10.55|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.54|8.54|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-1|-1|18.53|8.04|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.52|7.54|

I attempted the code below but all it does it prints out the actual data, not proposed. Thanks in advance.

Ash

Code:
awk ' BEGIN { FS = "|" ; OFS = "|"  ; RS = "\n"} {
if ( $1 ~ /^STUDENT/ ) {
newstudent = $2
}
if ( $1 ~ /^GRADES/ ) {
a = split ($0, myarr, "|" )
s =  myarr[1]"|"myarr[2]"|"myarr[3]"|"myarr[4]"|"
}
while ((getline line < &quot;actual&quot;) > 0) {
b = split (line, newline, &quot;|&quot; )
if (newline[0] = STUDENT ) {
origstudent = newline[2]
}
if ( myarr[0] ~ /^GRADES/ ) {
t = newline[1]&quot;|&quot;newline[2]&quot;|&quot;newline[3]&quot;|&quot;newline[4]&quot;|&quot;
}
if ( ( s == t ) && ( origstudent == newstudent ) ) {
printf (&quot;%s\n&quot; , myarr[a])
count = 1
}
else
count = 1
for (item in newline) {
count++
}
count = count - 2
for ( i = 1;i<count;i++ ) {
printf (&quot;%s|&quot; , newline[i] )
}
}}' proposed
 
Is the data you posted an example of the actual or proposed file? Post an example of the other file and the output you expect. Also, a more detailed explanation of what &quot;timing&quot; is may be helpful.

CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Hi CaKiwi

eg of actual file

STUDENT|JOHN|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|100|100|100|10.2|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|15|15|18.23|10.23|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|144|144|18.22|10.22|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.21|8.21|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-55|-55|18.20|7.85|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.19|7.85|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|475|475|31.70|10.32|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|15|15|18.31|10.31|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|73|73|18.30|10.30|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.29|8.29|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-127|-127|18.28|7.93|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.27|7.93|
STUDENT|MARY|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|485|485|31.66|10.49|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|1|1|18.48|10.48|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|1|1|18.47|10.47|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.46|8.46|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-1|-1|18.45|7.96|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.44|7.46|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|485|485|31.66|10.57|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|1|1|18.56|10.56|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|1|1|18.55|10.55|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.54|8.54|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-1|-1|18.53|8.04|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.52|7.54|

eg of proposed file

STUDENT|JOHN|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|3|3|8.73|1.43|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|144|144|18.22|10.22|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.21|8.21|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-55|-55|18.20|7.85|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.19|7.85|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|475|475|31.70|10.32|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|15|15|18.31|10.31|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|73|73|18.30|10.30|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.29|8.29|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-127|-127|18.28|7.93|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.27|7.93|
STUDENT|MARY|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|1|1|18.48|10.48|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|1|1|18.47|10.47|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.46|8.46|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-1|-1|18.45|7.96|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.44|7.46|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|485|485|31.66|10.57|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|1|1|18.56|10.56|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|1|1|18.55|10.55|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.54|8.54|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-1|-1|18.53|8.04|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.52|7.54|


The timings are the fields below existing of half hour periods and a number eg 2 and always have GRADES as the first field . These timings and their results are complete within the actual file but maybe incomplete within the proposed.

|2002-04-11 00:30|2002-04-11 01:00|2|

for example, the proposed file is missing line 1 for both students compared to the actual file
What i want to do is trawl through the actual file looking at the timing fields for each student.

|2002-04-11 00:00|2002-04-11 00:30|3|
|2002-04-11 00:00|2002-04-11 00:30|2|
|2002-04-11 00:00|2002-04-11 00:30|1|
|2002-04-11 00:00|2002-04-11 00:30|-1
|2002-04-11 00:00|2002-04-11 00:30|-2
|2002-04-11 00:00|2002-04-11 00:30|-3|
|2002-04-11 00:30|2002-04-11 01:00|3||
|2002-04-11 00:30|2002-04-11 01:00|2|
|2002-04-11 00:30|2002-04-11 01:00|1|
|2002-04-11 00:30|2002-04-11 01:00|-1
|2002-04-11 00:30|2002-04-11 01:00|-2|
|2002-04-11 00:30|2002-04-11 01:00|-3

then for each of these timing fields, i need to find the result fields from the proposed file and create a new file.
If this timing data and hence the results are not within the proposed file for the student, i need to copy the line within actual file to the new file.
If this timing data is within the proposed file i need to copy the proposed line and it's data to the new file

results should be
STUDENT|JOHN|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|100|100|100|10.2| #this line was from actual file
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|3|3|8.73|1.43| #this line was from proposed file
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|144|144|18.22|10.22|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.21|8.21|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-55|-55|18.20|7.85|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.19|7.85|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|475|475|31.70|10.32|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|15|15|18.31|10.31|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|73|73|18.30|10.30|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.29|8.29|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-127|-127|18.28|7.93|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.27|7.93|
STUDENT|MARY|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|485|485|31.66|10.49| #this line was from actual file
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|1|1|18.48|10.48|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|1|1|18.47|10.47|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.46|8.46|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-1|-1|18.45|7.96|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.44|7.46|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|485|485|31.66|10.57|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|1|1|18.56|10.56|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|1|1|18.55|10.55|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.54|8.54|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-1|-1|18.53|8.04|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.52|7.54|

The idea is to create a new file that contains proposed data if it exists, any gaps being filled with actual data.

 
Sorry

the format of the results should be


STUDENT|JOHN|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|100|100|100|10.2|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|3|3|8.73|1.43|
GRADES|2002-04-11 00:00|2002-04-11|0:30|1|144|144|18.22|10.22|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.21|8.21|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-55|-55|18.20|7.85|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.19|7.85|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|475|475|31.70|10.32|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|15|15|18.31|10.31|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|73|73|18.30|10.30|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.29|8.29|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-127|-127|18.28|7.93|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.27|7.93|
STUDENT|MARY|
GRADES|2002-04-11 00:00|2002-04-11 00:30|3|485|485|31.66|10.49|
GRADES|2002-04-11 00:00|2002-04-11 00:30|2|1|1|18.48|10.48|
GRADES|2002-04-11 00:00|2002-04-11 00:30|1|1|1|18.47|10.47|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-1|-1|-1|18.46|8.46|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-2|-1|-1|18.45|7.96|
GRADES|2002-04-11 00:00|2002-04-11 00:30|-3|-500|-500|18.44|7.46|
GRADES|2002-04-11 00:30|2002-04-11 01:00|3|485|485|31.66|10.57|
GRADES|2002-04-11 00:30|2002-04-11 01:00|2|1|1|18.56|10.56|
GRADES|2002-04-11 00:30|2002-04-11 01:00|1|1|1|18.55|10.55|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-1|-1|-1|18.54|8.54|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-2|-1|-1|18.53|8.04|
GRADES|2002-04-11 00:30|2002-04-11 01:00|-3|-500|-500|18.52|7.54
 
Here's my attempt. I assume the 3rd | on the 3rd line of the result you posted should be a space.

BEGIN {
FS=OFS=&quot;|&quot;
fn = &quot;proposed&quot;
while ((getline < fn) > 0) {
if ($1 == &quot;STUDENT&quot;) {
nam = $2
continue
}
tim = $2 $3 $4
a[nam,tim] = $0
}
}
{
if ($1 == &quot;STUDENT&quot;) {
nam = $2
print
next
}
tim = $2 $3 $4
if (a[nam,tim]) {
print a[nam,tim]
}
else {
print
}
}

CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Hey CaKiwi

Thanks very much, works perfectly. I am now working on wrapping it all within korn shell.

Thanks

Ash
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top