how do I append a string to the output file name? many thanks!

atjurhs · Aug 16, 2012

Hi guys,

I'm hoping that you have someone here has good unix shell programming skills. I've written a code that parses/processes hundreds of input files one at a time and that can be as large as a Gbyte using AWK and some shell scripting. It runs very fast for the amount of data it's running through. At the end of the processing the codes produces an output file in the same directory that its input file was in, standard kind of stuff, but the problem is that these hundreds of output files are strewn across the hundred of directories.

Based on the filtering criteria, many of these files will be empty, but others will have data, but I don't know which is which, and I can't really open up every directory to see if that outputFile has content or not.

So what I want to do is to send the output file from each run to a common directory. Trouble is the output files all have the exact same name, so they would just overwrite each other. So what I'd like to do is to append a unique string to each of the outputFiles, then they can all be sent to a common directory, and I can easily see which ones have data, and which ones don't. The unique string that I would like to use is the immediate directory that the input file is in. So by example, here's what I mean:

Here's what three of the input directory structures and file might look like, but there's really hundreds of them:
/home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/inputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/inputFile.txt

Here's what the current output looks like, note that they always have the same outputFile.txt name:
/home/tabitha/my_data/S-T-3-001-F_2012_08_16/ouputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/outputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/outputFile.txt

Here's what I need the the output to look like:
/home/tabitha/my_data/Common_Directory/S-T-3-001-F_2012_08_16_ouputFile.txt
/home/tabitha/my_data/Common_Directory/W-B-7-011-3_2012_08_15_outputFile.txt
/home/tabitha/my_data/Common_Directory/BA-Z-Y-011-A_081512_outputFile.txt

so the output directory precedes the outputFile name, hopefully this is clear.

I tried different combinations of getline, cat, find, but I think I keep getting stuck because I don't know how to cast that last directory name as a variable which in the print statement I could append.

what do you think? many thanks for whoever is able to help me

Annihilannic · Aug 16, 2012

It's not straightforward to answer without seeing the code that it needs to fit into, but something along these lines could work:

Code:

inputfile=/home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt
outputfile=$(echo "$inputfile" | sed 's#/inputFile#_outputFile#;s#my_data#my_data/Common_Directory#')
echo inputfile is $inputfile
echo outputfile is $outputfile

Code:

inputfile is /home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt
outputfile is /home/tabitha/my_data/Common_Directory/S-T-3-001-F_2012_08_16_outputFile.txt

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

atjurhs · Aug 17, 2012

Thanks Annihilannic!

Thanks so munch, it works great!

Except I need the script to loop over a text file that contains several hundred paths and inputFile.txt, like:

/home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt
/home/tabitha/my_data/W-B-7-011-3_2012_08_15/inputFile.txt
/home/tabitha/my_data/BA-Z-Y-011-A_081512/inputFile.txt
.
.
.

that's only three, I need to use the script on several hundred. The several hundred directory paths are listed out in a text file that I create using:

find /home/tabitha/my_data/ -name inputFile.txt > lists_of_paths_and_names_to_the_inputFiles.txt

do you know how to read in the listing textfile and assign it as the inputfile.txt of your script and loop over each line?

thanks so much for helping me!

Annihilannic · Aug 19, 2012

You said "I've written a code that parses/processes hundreds of input files one at a time", so I assumed you had already done that part.

Something like this?

Code:

while read inputfile
do
    outputfile=$(echo "$inputfile" | sed 's#/inputFile#_outputFile#;s#my_data#my_data/Common_Directory#')
    echo inputfile is $inputfile
    echo outputfile is $outputfile
done <lists_of_paths_and_names_to_the_inputFiles.txt

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

atjurhs · Aug 20, 2012

ut oh, somethings not right???

when I echo $inputfile I get back what I expect:

/home/tabitha/my_data/S-T-3-001-F_2012_08_16/inputFile.txt

/home/tabitha/my_data/W-B-7-011-3_2012_08_15/inputFile.txt

/home/tabitha/my_data/BA-Z-Y-011-A_081512/inputFile.txt

when I echo $output file I get back a list of the files with their new names:

/home/tabitha/my_data/Common_Directory/S-T-3-001-F_2012_08_16_ouputFile.txt

/home/tabitha/my_data/Common_Directory/W-B-7-011-3_2012_08_15_outputFile.txt

/home/tabitha/my_data/Common_Directory/BA-Z-Y-011-A_081512_outputFile.txt

but when I go to the Common_Directory, there are no files in there??? Maybe I didn't say something right, sorry, I'm really new at this, but I need the output files with the new names to be in the Common_Directory. I tried adding a mv command in the do loop but couldn't get that to work

Annihilannic · Aug 20, 2012

It would help if you posted your actual code so we can see where this stuff needs to fit. Presumably you just need to send the output of your processing to the output file, using your_processing_code_here >$outputfile, but as it is I can only guess.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

atjurhs · Aug 21, 2012

so there are a couple of steps that I do, first I run a find command:

find /svr_ardvark/home/tabitha/my_data/08042012/ -name MRAC.txt > list_of_paths_to_MRAC_files.txt

and this gives me:

/svr_ardvark/home/tabitha/my_data/08042012/S-T-3-001-F_2012/MRAC.txt
/svr_ardvark/home/tabitha/my_data/08042012/W-B-7-011-3_2012/MRAC.txt
/svr_ardvark/home/tabitha/my_data/08042012/BA-Z-Y-011-A_081512/MRAC.txt

then I need a tool (the one you've been helping me with and I greatly appreciate) that loops over the results of the find command and creates a text file that is a list that will be used for batch processing. Each entry in the list has four parts:

1) calling an awk script
2) the current position of the MRAC files
3) renaming the MRAC files with the directory name preceding the MRAC.txt file
4) redirection of the processed MRAC files into a common directory.

this output file is a batch_processing_list.txt file and should look like this:

awk -f parsing_tool.awk /svr_ardvark/home/tabitha/my_data/08042012/S-T-3-001-F_2012/MRAC.txt > /svr_ardvark/home/tabitha/my_data/08042012/common_directory/S-T-3-001-F_2012_MRAC.txt

awk -f parsing_tool.awk /svr_ardvark/home/tabitha/my_data/08042012/W-B-7-011-3_2012/MRAC.txt > /svr_ardvark/home/tabitha/my_data/08042012/common_directory/W-B-7-011-3_2012RAC.txt

awk -f parsing_tool.awk /svr_ardvark/home/tabitha/my_data/08042012/BA-Z-Y-011-A_2012/MRAC.txt > /svr_ardvark/home/tabitha/my_data/08042012/common_directory/BA-Z-Y-011-A_2012_MRAC.txt

the parsing_tool.awk that’s being called by batch_processing_list.txt is:

BEGIN {
FS=" "
}
{
if($11==8899)
printf("%s %d %d %s %d %d\n", $1, $4, $5, $11, $17,$21);
}
END {}

It all seems to work if I create the batch_processing_list.bash by hand with a text editor and excel, which is ok if I’m just trying to process under a hundred files, but to do this on several hundreds for files I will need more automated scripts.

Thanks again, so much for all your help!!!

Annihilannic · Aug 21, 2012

You can pipe the results of the find command directly into the loop that does the processing (unless you specifically need the intermediate file for some other purpose) as follows.

Code:

find /svr_ardvark/home/tabitha/my_data/08042012/ -name MRAC.txt | while read inputfile
do
    outputfile=$(echo "$inputfile" | sed 's#/inputFile#_outputFile#;s#my_data#my_data/Common_Directory#')
    awk -F " " '$11==8899 { printf("%s %d %d %s %d %d\n", $1, $4, $5, $11, $17, $21); }' $inputfile >$outputfile
done

I have also abbreviated the awk script somewhat and included it on the command-line rather than in a separate parsing_tool.awk script since it is quite short. I specified the separator on the command-line (probably unnecessary because the default separator in awk is white space), and then removed the unnecessary BEGIN and END clauses (since they're both now empty), and also the if statement is not required because each statement in awk has an implicit conditional at the start anyway.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

atjurhs · Aug 22, 2012

It works perfectly! Thanks sooo much for all your help!

Next, I'm going to try to use data from a different file (called L102.txt) that contains two columns of data to use as filtering criteria in the MRAC.txt file. The L102.txt file has data in one column that I know the values that I want, and data in the second (correlated with the first column) that I don't know the values that I want, it's these values in the second column that I need to use for filtering the MRAC.txt file

I want to give this a try on my own, but I'll probably need a little help.....

Tabby

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

how do I append a string to the output file name? many thanks!

atjurhs

Programmer

Annihilannic

MIS

atjurhs

Programmer

Annihilannic

MIS

atjurhs

Programmer

Annihilannic

MIS

atjurhs

Programmer

Annihilannic

MIS

atjurhs

Programmer

Similar threads

Part and Inventory Search

Sponsor