Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A complex (for me) file parse 1

Status
Not open for further replies.

kHz

MIS
Dec 6, 2004
1,359
US
I have a file that contains ~850 packages that are listed like below. The packages are grouped together like the first one with php4 being a php4 bundle with a certain version and release and later in the file another php4 bundle might have a certain php4 package with another version and release. The same for an Apache bundle.

This repeats over and over again. What I need for output is a listing of all bundles that have the highest version and release.

For example, maybe the php4 bundle that is listed beginning on line 40 and has 12 rpm packages will have a version and release of 4.3.4/44.38 for all of the 12 rpm packages. Later on say beginning on line 122 is another version and release of php4 with 12 rpm packages that has 4.4.5/22.89 and then later on say beginning on line 732 a php4 bundle with all 12 rpm packages has a version/release of 5.9.1/11.34 but 3 of the rpm packages have a version/release that match those on line 122, so either line 122 or 732 could have those rpm packages and their version/release printed.

This would have to occur for every bundle in the file. The package name and version and release are in their own columns.

Also in each bundle there may be two lines like in the php4 bundle with two lines each with a php4 package and consecutive IDs like 1234 and 1235. This is because one is the rpm package and the other is the rpm patch. I only need the lower number with would be 1234, not 1235.

Example file:
Code:
ID     PACKAGE       EPOCH  VERSION   RELEASE  TARGET         ARCH
1723 |  php4        |      | 4.3.4  |  44.38  | sles-9-i586  |  sles-9-i586
1724 |  php4        |      | 4.3.4  |  44.38  | sles-9-i586  |  sles-9-i586
1787 |  php4-pear   |      | 4.3.4  |  44.38  | sles-9-i586  |  sles-9-i586
1788 |  php4-pear   |      | 4.3.4  |  44.38  | sles-9-i586  |  sles-9-i586

ID     PACKAGE         EPOCH      VERSION   RELEASE  TARGET         ARCH
1087 |  apache2       |          | 4.3.4  |  44.38  | sles-9-i586   |  sles-9-i586
1118 |  apache2-ssl   |          | 4.3.4  |  44.38  |  sles-9-i586  |  noarch

...[other lists]

ID     PACKAGE          EPOCH     VERSION   RELEASE  TARGET         ARCH
1287 |  apache2       |          | 4.3.6  |  46.28 |  sles-9-i586  |  sles-9-i586
1388 |  apache2-ssl   |          | 4.3.6  |  46.28 |  sles-9-i586  |  noarch

...[other lists]

ID     PACKAGE  EPOCH  VERSION   RELEASE  TARGET         ARCH
1986 |  php4   |           | 4.3.9  |  54.38 |  sles-9-i586  |  sles-9-i586
1987 |  php4   |           | 4.3.9  |  54.38 |  sles-9-i586  |  sles-9-i586
1967 |  php4-pear   |      | 4.3.9  |  44.38 | sles-9-i586  |  sles-9-i586
1968 |  php4-pear   |      | 4.3.4  |  44.38 |  sles-9-i586  |  sles-9-i586

... [other lists]

What I would like is output that contains the ID PACKAGE VERSION RELEASE where each package is the highest version and release.

Example:
Code:
1986 php4 4.3.9 54.38 
1968 php4-pear 4.3.4 44.38
1287 apache2 4.3.6 46.28
1388 apache2-ssl 4.3.6 46.28 
...[other packages]

I cannot get it to give me output correctly by comparing the php4 package on line nn with that on line nn+1 then those against line nn+234 and print the highest version and release and do that with every bundle/package in the file. As I mentioned, there are 800+ bundles and each bundle can have anywhere from 1 to 20 packages. And there may be a bundle that only occurs once or it may occur 10 times and some of the packages in the bundle with have a higher version/release than the previous bundle, but there may be some packages in a bundle that don't have a higher version/release with the new bundle.

Thanks!!!
 
A starting point:
awk -F'|' '
/^[1-9]/{gsub(/ +/,"");$0=$0;if($4" "$5>v[$2]){i[$2]=$1;v[$2]=$4" "$5}}
END{for(p in i)print i[p],p,v[p]}
' /path/to/input

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
That works great, but I kind of hit a snag that I (also) cannot figure out.

68.10 is newer than 68.7 but the sort is printing 68.7 as the latest because 7 is after 1.

For example:

Code:
2704  | gpg  1.2.4  |  68.10  |  sles-9-i586  | i586
2705  | gpg  1.2.4  |  68.10  |  sles-9-i586  | i586
2754  | gpg  1.2.4  |  68.13  |  sles-9-i586  | i586
2755  | gpg  1.2.4  |  68.13  |  sles-9-i586  | i586
3166  | gpg  1.2.4  |  68.16  |  sles-9-i586  | i586
3167  | gpg  1.2.4  |  68.16  |  sles-9-i586  | i586
11133 | gpg  1.2.4  |  68.4   |  sles-9-i586  | i586
11132 | gpg  1.2.4  |  68.4   |  sles-9-i586  | i586
3929  | gpg  1.2.4  |  68.7   |  sles-9-i586  | i586
181   | gpg  1.2.4  |  68.7   |  sles-9-i586  | i586

I would want the line of "3166 | gpg 1.2.4 | 68.16 | sles-9-i586 | i586" printed
because 16 is higher than 7. But of course 7 is printing as I mentioned because .7 is greater
than .1. Is this simple to fix? Otherwise it works great!

Thanks!!
 
Perhaps with a custom compare function ?
awk -F'|' '
function myGT(x,y, a,b,j,c,k,z,t){
if(split(x,a,/ /)!=2)return 0
if(split(y,b,/ /)!=2)return 1
j=split(a[1],c,/\./);for(k=1;k<=j;++k)z=100*z+c[k]
j=split(b[1],c,/\./);for(k=1;k<=j;++k)t=100*t+c[k]
if(z>t)return 1;if(z<t)return 0
j=split(a[2],c,/\./);for(k=1;k<=j;++k)z=100*z+c[k]
j=split(b[2],c,/\./);for(k=1;k<=j;++k)t=100*t+c[k]
return(z>t)
}
/^[1-9]/{
gsub(/ +/,"");$0=$0
if(myGT($4" "$5,v[$2])){i[$2]=$1;v[$2]=$4" "$5}
}
END{for(p in i)print i[p],p,v[p]}
' /path/to/input

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thank you!!! That was exactly what I wanted.

If you have time, can you explain the function and the main part so I can better understand it.

Thanks again!!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top