Awk, search for string and combine data through multiple directories.

GradUser2010 · Oct 21, 2010

Sorry for the newbie question.

I need to compile a large amount of data from individual text files throughout many directories.

An example data file is below. I want to search for the following string, "cc_sectors_1" and combine all the data from each file which contains this string, into one new output file.

cc_sectors_1_pt2.nii.gz
34 408.000000 0.582149 0.165659 0.250764 0.783992
cc_sectors_2_pt2.nii.gz
10 120.000000 0.515655 0.140343 0.329384 0.711448
cc_sectors_3_pt2.nii.gz
11 132.000000 0.552913 0.178691 0.266971 0.775907
cc_sectors_4_pt2.nii.gz
15 180.000000 0.522139 0.174769 0.255384 0.728975
cc_sectors_5_pt2.nii.gz
41 492.000000 0.511493 0.173014 0.220017 0.720323

I am assuming this will be an awk command. Any suggestions?

I appreciate any help.

olded · Oct 21, 2010

I see no reason for awk. This works provided your file names don't contain spaces - searches all the files from the present working directory:

Code:

#!/bin/ksh

cd <to_your_directory>
file="$(pwd)/newoutput.txt"

rm -f $file
find . -type f -print|xargs grep -l cc_sectors_1| while read myfile
do
   cat $myfile >> $file
done

GradUser2010 · Oct 22, 2010

Great! Thank you olded! There are a few problems with this I did not foresee.

Now that I have this mass of data in one file, I cannot tell which piece of data corresponds to which filename. I need to order each of the pieces alphabetically based on their filename.

For instance inserting AAsubjectname BBsubjectname in order:

AAsubjectname
cc_sectors_1_pt2.nii.gz
34 408.000000 0.582149 0.165659 0.250764 0.783992
cc_sectors_2_pt2.nii.gz
10 120.000000 0.515655 0.140343 0.329384 0.711448
cc_sectors_3_pt2.nii.gz
11 132.000000 0.552913 0.178691 0.266971 0.775907
cc_sectors_4_pt2.nii.gz
15 180.000000 0.522139 0.174769 0.255384 0.728975
cc_sectors_5_pt2.nii.gz
41 492.000000 0.511493 0.173014 0.220017 0.720323

BBSubjectname
cc_sectors_1_pt2.nii.gz
124 4308.000000 0.582149 0.165659 0.250764 0.783992
cc_sectors_2_pt2.nii.gz
123 2320.000000 0.515655 0.140343 0.329384 0.711448
cc_sectors_3_pt2.nii.gz
112 1232.000000 0.552913 0.178691 0.266971 0.775907
cc_sectors_4_pt2.nii.gz
15 180.000000 0.522139 0.174769 0.255384 0.728975
cc_sectors_5_pt2.nii.gz
41 492.000000 0.511493 0.173014 0.220017 0.720323

Thanks again, you are a lifesaver.

p5wizard · Oct 22, 2010

Change the last 3 lines:

[tt]do
cat $myfile >> $file
done[/tt]

into these four:

[tt]do
echo $myfile
sort $myfile
done > $file[/tt]

or if you want each line of every file preceded with the corresponding filename:

[tt]do
sort $myfile|sed "s!^!$myfile !"
done > $file[/tt]

HTH,

p5wizard

GradUser2010 · Oct 23, 2010

Thank you all for your responses!

p5wizard. This is helpful but this command is reorganizing each subjects numbered output rather than organizing by each subject and leaving each subjects data in the order it originally was.

Here are the first two subjects output. As you can see it is putting the numbers in order(which I do not want) and failing to order the subjects (a-z).

./zash_CC_data
13 156.000000 0.505853 0.129748 0.262874 0.699645
13 156.000000 0.538307 0.232008 0.255260 0.797412
18 216.000000 0.561296 0.191070 0.237020 0.799556
22 264.000000 0.575743 0.195684 0.209284 0.834430
28 336.000000 0.567538 0.114532 0.309250 0.744849
cc_sectors_1_pt2.nii.gz
cc_sectors_2_pt2.nii.gz
cc_sectors_3_pt2.nii.gz
cc_sectors_4_pt2.nii.gz
cc_sectors_5_pt2.nii.gz
./erbe_CC_data_2
12 144.000000 0.616767 0.168490 0.284263 0.786699
14 168.000000 0.521187 0.166490 0.212461 0.694295
15 180.000000 0.585272 0.120794 0.378161 0.748938
24 288.000000 0.607249 0.133945 0.271707 0.769484
37 444.000000 0.558194 0.177269 0.200309 0.810863
cc_sectors_1_2_pt2.nii.gz
cc_sectors_2_2_pt2.nii.gz
cc_sectors_3_2_pt2.nii.gz
cc_sectors_4_2_pt2.nii.gz
cc_sectors_5_2_pt2.nii.gz

Here is my current code:

rm -f $file
find . -type f -print|xargs grep -l cc_sectors_1| while read myfile
do
echo $myfile
sort $myfile
done > $file

Any suggestions?

p5wizard · Oct 26, 2010

so, sort the list of found files and just cat the files instead of finding the files and sorting their contents.

[tt]find . -type f -print|xargs grep -l cc_sectors_1|[red]sort|[/red]while read myfile
do
echo $myfile
[red]cat[/red] $myfile
done > $file[/tt]

HTH,

p5wizard

GradUser2010 · Oct 26, 2010

This worked. Thank you so much, altruism does exist.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Awk, search for string and combine data through multiple directories.

GradUser2010

Technical User

olded

Programmer

GradUser2010

Technical User

p5wizard

IS-IT--Management

GradUser2010

Technical User

p5wizard

IS-IT--Management

GradUser2010

Technical User

Similar threads

Part and Inventory Search

Sponsor