Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to find min and max values from a file based on specified patterns 1

Status
Not open for further replies.

amili

Programmer
Nov 9, 2010
18
CA
Hi,

i've been trying to extract min and max values for the following data using awk script but failed. i keep on getting the same repeated results for both min and max and the results seems ridiculous.

I need to generate results based on Id. If column 4 is a '+', then the min value should be taken from column 2 in row 1 and max value from column 3 in the last row for that particular Id. But, if column 4 is '-', then the min value should be taken from column 2 in last row and max value should be taken from column 3 on the first row for that particular id.

sca_1 13976 14249 + Id 4 Numb 1
sca_1 14306 14416 + Id 4 Numb 2
sca_1 14661 14698 + Id 4 Numb 3
sca_1 16111 16236 + Id 5 Numb 1
sca_1 16367 16643 + Id 5 Numb 2
sca_1 21204 22151 - Id 6 Numb 2
sca_1 22217 22765 - Id 6 Numb 1

the output should be generated in a new file that contains 3 columns (id, min and max) and should look like this:-

Id 4 13976 14698
Id 5 16111 16643
Id 6 22217 22151

i have thousands of data like this in 1 file and i have hundreds of files like this that i need to sort out. i would really appreciate your kind assistance or advise on this.. thanks
 
Try this
Code:
{
  if ($6 != svid)
  {
    if (NR > 1)
    {
      if (p)
      {
        print "id", svid, mn1, mx2
      }
      else
      {
        print "id", svid, mn2, mx1
      }
    }
    svid = $6
    p = $4=="+"
    mn1 = $2
    mx1 = $3
  }
  mn2 = $2
  mx2 = $3
}
END {
  if (p)
  {
    print "id", svid, mn1, mx2
  }
  else
  {
    print "id", svid, mn2, mx1
  }
}

CaKiwi
 
Hi CaKiwi,

thanks so much for your response..

i tried your codes but it gives me results like this:-

id 4 13976 14249
id 13976 13978
id 4 14306 15651
id 15649 15651
id 5 16111 16236
id 16111 16113
id 5 16367 16643
id 6 21204 22151
id 21204 21206
id 6 22217 22765
id 22763 22765

the result is not as expected. I need to show the only max and min values for the same id in one line. Also, there are duplicates values in input files and i need to only display one of them. any further idea? thanks
 
I ran it against the sample data you gave and got the results you specified in your first post. What data did you use to get the results in your second post?

CaKiwi
 
hi,

yeah, i tried that one too and it gives me the correct result even though i put in some redundant values.But when i try to work on the actual data, it gave me the result in my 2nd post. Actual data has much more fields and there are a lot of redundant values and it is in general file format (gff)
 
it's ok... i just found the solution.

i changed the column for svid and i removed the ';' symbol from my input file and it works great!!

Thanks so much for your kind help on this.. :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top