Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using "for" loop in gawk - Odd output? 2

Status
Not open for further replies.

madasafish

Technical User
Jul 18, 2006
78
TH
I am getting odd output. Any help appreciated,
Madasafish

Code:
gawk '{
for (i = 1; i <= 500; i=i+100)
if (substr ($0,10,4) > i && (substr ($0,10,4) <i+100))
print $0 " Location_"i"-"i+99
}' $infile

$infile
D200805301 3320842 1.00
D200805302 3320842 1.00
D2008053010 3320842 1.00
D2008060199 3320842 0.87
D20080602100 3320842 0.82
D20080603101 3320842 0.77
D20080603199 3320842 0.77
D20080602200 3320842 0.82
D20080604201 3320842 0.72
D20080605299 3320842 0.66
D20080602300 3320842 0.82
D20080606301 3320842 0.61
D20080607399 3320842 0.55
D20080602400 3320842 0.82
D20080608401 3320842 1.47
D20080608499 3320842 1.47
D20080602500 3320842 0.82


Notes:
"Wrong" entry should be in Location 1-100
Location 99 is missing 4th line down in infile
Everything else looks correct

Output
D200805301 3320842 1.00 Location_1-100
D200805302 3320842 1.00 Location_101-200 <--Wrong
D2008053010 3320842 1.00 Location_1-100
D20080602100 3320842 0.82 Location_1-100
D20080603101 3320842 0.77 Location_101-200
D20080603199 3320842 0.77 Location_101-200
D20080602200 3320842 0.82 Location_101-200
D20080604201 3320842 0.72 Location_201-300
D20080605299 3320842 0.66 Location_201-300
D20080602300 3320842 0.82 Location_201-300
D20080606301 3320842 0.61 Location_301-400
D20080607399 3320842 0.55 Location_301-400
D20080602400 3320842 0.82 Location_301-400
D20080608401 3320842 1.47 Location_401-500
D20080608499 3320842 1.47 Location_401-500
D20080602500 3320842 0.82 Location_401-500


 
Can you include the following output when it is processing one location. I have tried and it prints it out for every line.
There could be thousands of lines per location.

Thanks for you help.
Madasafish

print "Please wait...Processing Location_"i"-"i+99
 
Not sure what you are trying to achieve, but first you need to force the condition to do a numerical comparison, instead of a string to number comparison, as it appears to be in your current code. So see if this one helps:

Code:
 gawk '{
for (i = 1; i <= 500; i=i+100)
if (substr ($0,10,4)+ 0 > i && (substr ($0,10,4)+ 0 <i+100))
  print $0 " Location_"i"-"i+99
}' $infile
 
Sorry, Let me explain more clearly what I am trying to acheive.

1. Position 10 to 13 of each line are numbers and represent a Location
2. Group these numbers into a range (Location Group) and append to a range (Location Group) file. Ranges are 1-100, 101-200, 201-300...etc
3. For each range (Location Group) that is being processed, print (once only) "Please wait...Processing Location_"i"-"i+99. Where i = 1, 101, 201...etc

Thank-you moonring,

I did try your suggestion but it is still missing odd lines.

Thanks in advance
Madasafish

Current code:
gawk -v logfile=$LOGFILE '{
for (i = 1; i <= 500; i=i+100)
if (substr ($0,10,4)+0 > i && (substr ($0,10,4)+0 <i+100))
print $0 >> "Location_"i"-"i+99
}' $infile
 
There are few things to consider, first in your if condition you're excluding the corner cases, ie when substr is 1,101,201,...,( 1>1, ...??), so you need >= operator instead. Also as per your requirement, substr should have 3 chars, not 4.
Still not clear if you need all output in one file, or separate files.

One file,

Code:
gawk '{a=substr ($0,10,3)+ 0 ;
 for (i = 1; i <= 500; i=i+100)
   if (a >= i && a < i+100)
    if (a==i)                                                                                                               
      print "Please wait...Processing Location_"i"-"i+99"\n"$0 " Location_"i"-"i+99
    else  
      print $0 " Location_"i"-"i+99 
}' $infile > Range_file

Or many location files,

Code:
awk '{a=substr ($0,10,3)+ 0 ;
  for (i = 1; i <= 500; i=i+100)
   if (a >= i && a < i+100)
    if (a==i)                                                                                                                
      print "Please wait...Processing Location_"i"-"i+99"\n"$0 " Location_"i"-"i+99 > "Location_"i"-"i+99
     else  
       print $0 " Location_"i"-"i+99 > "Location_"i"-"i+99
}' $infile
 
While posting the formatting got slightly disarranged. Use,
Code:
if (a==i)    # not if 
                   (a==i)
 
Thank-you moonring, I believe I eventually cracked it with the following code. It's only been tested on some small test files.
I will be testing tomorrow on the big files.
I appreciate your help and advice with this.
Madasafish

Code:
gawk '{
for (f = 100; f <=1000; f=f+100)
if (substr ($0,10,4)+0 > f-100 && (substr ($0,10,4)+0 <= f))
print $0 >> "Locations_"f-99"_"f
}' $INFILE


 
I have tried this code and it does work. The issue I have is I have 64 large files to process and it is takeing approx 5 minutes for each file to be processed. My question is..

A: Can this code be optimised to be more efficient?
B: Could perl do the job quicker (sorry if it's wrong forum).

Many thanks,
Madasafish



#Filter locations from 64 files into 10 files
for ((a=1; a <=64; a++))
do
INFILE=file_${a}_64

if [ -f $INFILE ]
then
Logit "Please wait...Filtering locations from $INFILE"
gawk '{
for (f = 100; f <=1000; f=f+100)
if (substr ($0,10,4)+0 > f-100 && (substr ($0,10,4)+0 <= f))
print $0 >> "Locations_"f-99"_"f
}' $INFILE
else
Logit "Error.........$INFILE not found!"
continue
fi

done
 
This version replaces the for loop with some mathematics. It runs in about a 5th of the time with my testing:

Code:
gawk '
        {
                loc=substr($0,10,4)+0
                f=(int((loc-1)/100)+1)*100
                print $0 >> "Locations_"f-99"_"f
        }
' infile


Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top