Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with traversing records in an AWK .exe 1

Status
Not open for further replies.

Jarrod13

Programmer
Dec 18, 2008
12
Okay,

I'm having a problem with an awk program I'm working on.

What I'm doing is checking the entire file for a hex non printable character. This is how I do it:

for(f = 0; f <= NR; f++){
if (match($0, /\x1B/) != 0)
{
NON_PRINT_CHAR = 1;
}
getline < MYFILENAME;
}

Now, the problem I have is with "getline < MYFILENAME" Once the loop is done running it sets $0 to the last record in the file.

When I try and write another function to perform a certain task it won't work because it starts processing at the current value of $0 which is at the end of the file.

Is there a better way to write this loop so that it won't change the value of $0 but at the same time traverse every record in the file?

Any help is greatly appreciated! Thank you!
 
Hi

Read in a variable instead of [tt]$0[/tt] :
Code:
for(f = 0; f <= NR; f++){
        if (match([red]line[/red], /\x1B/) != 0)                
        {
                NON_PRINT_CHAR = 1;
        }
        getline [red]line[/red] < MYFILENAME;
}
But I am not sure about what you are trying to accomplish there. Could you tell us some more ? Sorry, but that code gives a feeling like you misunderstand the basics of [tt]awk[/tt] functionality.

Feherke.
 
feherke, Thanks for your quick response. I am new to AWK and I'm still learning its functionality.

I tried your solution to my problem and I'm still getting the same error.

Let me give you some background as to what I'm trying to do:

I'm working on an awk program that is compiled (by MKS) into an exe and ran in a windows environment via the command prompt.

What this exe does is parsse big multi record text files and checks things like if the number of fields are correct, does it have the correct header, trailer..etc.

When we receive these text files sometimes there are hex non printable characters. The code I'm working on is supposed to traverse the entire text file and check each field in every record to make sure that it is not any of the hex non printable characters that I have specified.

Now, what I had was working. I'm essentially using the 'getline < MYFILENAME' command to move to the next record of the file and then running it through my for loop again check for the hex non printables.

The problem that I'm having is that the 'getline' command is changing the value of $0. So since the for loop runs until it's <= to the number of records $0 is set to the end of the text file.

I'm working on another function that will sum numbers in a given field in each record of the file and then compare that sum to the last record of the file. Here is my code for that:

function ControlSum()
{

getline variable < MYFILENAME

for(i = 0; i < NR; i++)
{
COUNT = $CFIELD;
SUM += COUNT;

getline variable < MYFILENAME;
}
print SUM;
return SUM;

}

Since the value of $0 was changed during the execution of the hex non printable character check, this ControlSum function will not work. So if I were to comment out the hex null printable code, the ControlSum function would work.

I'm stuck right now so your help would be appreciated again, what do you think I should do to correct this?

I hope all that made sense, if not let me know and I can explain it again.

Thanks so much,

Jarrod
 
use close(MYFILENAME) at the end of each loop.

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hi

Sorry, but I still have a feeling that you have serious misconceptions related to [tt]awk[/tt].

Even more, now I am in doubt you understood and implemented PHV's suggestion correctly. You put that line of code after the [tt]for[/tt] loop's closing brace ( } ) and before the [tt]return[/tt] statement, right ?

A big question is, why are you using [tt]NR[/tt] in the [tt]for[/tt] statement. I can not imagine a situation when you need that.

Anyway, reading into a variable and getting the $CFIELD field will not work as you expected.
Code:
function ControlSum()
{
  while ((getline < MYFILENAME)>0) {
    COUNT = $CFIELD;
    SUM += COUNT;
  }
  close(MYFILENAME)
  print SUM;
  return SUM;
}

[gray]# or[/gray]

function ControlSum()
{
  while ((getline line < MYFILENAME)>0) {
    split(line,field)
    COUNT = field[CFIELD];
    SUM += COUNT;
  }
  close(MYFILENAME)
  print SUM;
  return SUM;
}

Feherke.
 
Feherke, thank you for your help.

I used NR in the for statement so that the loop will run until it reaches the end of the file.

for(i = 0; i < NR; i++)

So when i is equal to NR it will be processing the last record in the file. Why is this incorrect?

You ControlSum() solution worked for me.

I don't fully understand the properties of the getline function.
Am I wrong to think that 'getline' is an iterator in awk?

How is it working in your while loop logic?

while ((getline < MYFILENAME)>0)


Thank you again for the help.




 
Hi

man awk said:
NR The total number of input records seen so far.
So it is the order number of the record last read from the input and made available for processing. If you have a 3 records, the [tt]NR[/tt] will not be set to 3 by start, but it will take the values from 1 to 3 as the records are read sequentially from the 1[sup]st[/sup] to the 3[sup]rd[/sup].

Anyway, [tt]NR[/tt]'s automatic record counting functionality is bound to the processing of input. While you read a different file, the [tt]NR[/tt] was not affected by that.

The [tt]getline[/tt] is like [tt]readln[/tt]/[tt]gets()[/tt]/[tt]scan[/tt]/whatever in other languages : reads from input or file and places the data into a variable. In [tt]awk[/tt] the [tt]getline[/tt] has some variations :
man awk said:
getline Set $0 from next input record; set NF, NR, FNR.

getline <file Set $0 from next record of file; set NF.

getline var Set var from next input record; set NR, FNR.

getline var <file Set var from next record of file.

command | getline [var]
Run command piping the output either into $0 or var, as above.
Regarding my [tt]while[/tt] loop, that works because :
man awk said:
The getline command returns 0 on end of file and -1 on an error.
As a conclusion, read the [tt]awk[/tt] man page. If you not have it on your system, google for "man awk" for an on-line version of it.

Feherke.
 
To add to Feherke's excellent advice, you should very rarely need to actually write a for loop/iterator of any sort in an awk script to process all input records, because the "for every record in the input file" behaviour is implicit in awk's processing anyway. Consider this script for example:

Code:
{ print }

... which makes awk simply iterate through every line of input and print it out.

So rather than trying to do a "for every input record do this check" and then "for every input record do that check", you should rearrange your programme to allow awk to worry about the "for every input record" part, and you only need to write the "do this check, do that check" bits.

Annihilannic.
 
feherke, Annihilannic,

Thank you again for your excellent advice and help.


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top