Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Simultaneous multiple files processing 1

Status
Not open for further replies.

moonring

Technical User
Feb 15, 2008
38
US
-

Hi all,


While trying on a problem that involved three files, I got stuck on how to proccess them simultaneously by AWK. I know that it can be done for two files, but for three I'm not sure, say compare a field from third file, with another field in file two, then from the result do something in the first file ..., general example for two files:

Code:
awk 'NR==FNR{a[$0]=$1;next}
      {...action for file_2 ....}
       .......
      END { print ....}' file_1 file2

So, can AWK process three files at a time, or for that matter more than three, at once ?

Code:
awk 'NR==FNR{a[$0]=$1;next}
      {...action for file_2 ....}
       {...action for file 3....}
      END { print ....}' [red]file_1 file_2 file_3 ...[/red]


Or the actions have to be done separately for two files at a time, and then continue with the other ones at a different step ? Does AWK support simultaneous multiple files processing ?

Thanks for your time.
 
There's nothing wrong with the approach you were taking there where you load all of the data of each file into an array and then process them together in the END {} clause. Unless of course they are massive files and it would consume too much memory.

But you can do I/O to the files on a random/explicit basis using the getline statement if you wish, rather than allowing awk to process them sequentially.

Annihilannic.
 
Thanks a lot Annihilannic for your quick response,

In other words AWK does support proccessing any number of files at once in its input ! ? And it all depends on how to successfully deal with them within AWK, (ex. using getline, or similar tools).

 
Yes, it does.

The NR==FNR method you used is one way to process the first file specifically, however it doesn't allow you to do different processing of the second, third, etc. files.

Another option is something like:

Code:
awk '
    FILENAME==ARGV[1] { file_1_data[$1]=$0; next }
    FILENAME==ARGV[2] { file_2_data[$1]=$0; next }
    FILENAME==ARGV[3] { file_3_data[$1]=$0; next }
    END {
        # do actual data processing here
    }
' file_1 file_2 file_3

Or you could do it explicitly like:

Code:
awk '
    BEGIN {
         getline < ARGV[1]
         # process file_1 data
         ...
              # additional logic
              ...
              getline < ARGV[2]
              # process file_2 data

         getline < ARGV[3]

         close(ARGV[1])
         close(ARGV[2])
         close(ARGV[3])
    }
' file_1 file_2 file_3

With getline it is a good idea to close files when you have finished with them, especially if you are processing a large number of files, otherwise you may use up all available file descriptors.

Annihilannic.
 
Thank You Annihilannic,

I wish I could assign 5 stars to your detailed explanation. I got more than I asked for.
Now I know that it can & how to !


Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top