Hi guys,
I'm hoping that you have someone here has good unix shell programming skills. I've written a code that parses/processes hundreds of input files one at a time and that can be as large as a Gbyte using AWK and some shell scripting. It runs very fast for the amount of data it's running through. At the end of the processing the codes produces an output file in the same directory that its input file was in, standard kind of stuff, but the problem is that these hundreds of output files are strewn across the hundred of directories.
Based on the filtering criteria, many of these files will be empty, but others will have data, but I don't know which is which, and I can't really open up every directory to see if that outputFile has content or not.
So what I want to do is to send the output file from each run to a common directory. Trouble is the output files all have the exact same name, so they would just overwrite each other. So what I'd like to do is to append a unique string to each of the outputFiles, then they can all be sent to a common directory, and I can easily see which ones have data, and which ones don't. The unique string that I would like to use is the immediate directory that the input file is in. So by example, here's what I mean:
Here's what three of the input directory structures and file might look like, but there's really hundreds of them:
/home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/inputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/inputFile.txt
Here's what the current output looks like, note that they always have the same outputFile.txt name:
/home/tabitha/my_data/S-T-3-001-F_2012_08_16/ouputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/outputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/outputFile.txt
Here's what I need the the output to look like:
/home/tabitha/my_data/Common_Directory/S-T-3-001-F_2012_08_16_ouputFile.txt
/home/tabitha/my_data/Common_Directory/W-B-7-011-3_2012_08_15_outputFile.txt
/home/tabitha/my_data/Common_Directory/BA-Z-Y-011-A_081512_outputFile.txt
so the output directory precedes the outputFile name, hopefully this is clear.
I tried different combinations of getline, cat, find, but I think I keep getting stuck because I don't know how to cast that last directory name as a variable which in the print statement I could append.
what do you think? many thanks for whoever is able to help me
I'm hoping that you have someone here has good unix shell programming skills. I've written a code that parses/processes hundreds of input files one at a time and that can be as large as a Gbyte using AWK and some shell scripting. It runs very fast for the amount of data it's running through. At the end of the processing the codes produces an output file in the same directory that its input file was in, standard kind of stuff, but the problem is that these hundreds of output files are strewn across the hundred of directories.
Based on the filtering criteria, many of these files will be empty, but others will have data, but I don't know which is which, and I can't really open up every directory to see if that outputFile has content or not.
So what I want to do is to send the output file from each run to a common directory. Trouble is the output files all have the exact same name, so they would just overwrite each other. So what I'd like to do is to append a unique string to each of the outputFiles, then they can all be sent to a common directory, and I can easily see which ones have data, and which ones don't. The unique string that I would like to use is the immediate directory that the input file is in. So by example, here's what I mean:
Here's what three of the input directory structures and file might look like, but there's really hundreds of them:
/home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/inputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/inputFile.txt
Here's what the current output looks like, note that they always have the same outputFile.txt name:
/home/tabitha/my_data/S-T-3-001-F_2012_08_16/ouputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/outputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/outputFile.txt
Here's what I need the the output to look like:
/home/tabitha/my_data/Common_Directory/S-T-3-001-F_2012_08_16_ouputFile.txt
/home/tabitha/my_data/Common_Directory/W-B-7-011-3_2012_08_15_outputFile.txt
/home/tabitha/my_data/Common_Directory/BA-Z-Y-011-A_081512_outputFile.txt
so the output directory precedes the outputFile name, hopefully this is clear.
I tried different combinations of getline, cat, find, but I think I keep getting stuck because I don't know how to cast that last directory name as a variable which in the print statement I could append.
what do you think? many thanks for whoever is able to help me