Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Awk, search for string and combine data through multiple directories.

Status
Not open for further replies.

GradUser2010

Technical User
Oct 21, 2010
6
Sorry for the newbie question.

I need to compile a large amount of data from individual text files throughout many directories.


An example data file is below. I want to search for the following string, "cc_sectors_1" and combine all the data from each file which contains this string, into one new output file.


cc_sectors_1_pt2.nii.gz
34 408.000000 0.582149 0.165659 0.250764 0.783992
cc_sectors_2_pt2.nii.gz
10 120.000000 0.515655 0.140343 0.329384 0.711448
cc_sectors_3_pt2.nii.gz
11 132.000000 0.552913 0.178691 0.266971 0.775907
cc_sectors_4_pt2.nii.gz
15 180.000000 0.522139 0.174769 0.255384 0.728975
cc_sectors_5_pt2.nii.gz
41 492.000000 0.511493 0.173014 0.220017 0.720323



I am assuming this will be an awk command. Any suggestions?

I appreciate any help.
 
I see no reason for awk. This works provided your file names don't contain spaces - searches all the files from the present working directory:

Code:
#!/bin/ksh

cd <to_your_directory>
file="$(pwd)/newoutput.txt"

rm -f $file
find . -type f -print|xargs grep -l cc_sectors_1| while read myfile
do
   cat $myfile >> $file
done
 
Great! Thank you olded! There are a few problems with this I did not foresee.

Now that I have this mass of data in one file, I cannot tell which piece of data corresponds to which filename. I need to order each of the pieces alphabetically based on their filename.

For instance inserting AAsubjectname BBsubjectname in order:

AAsubjectname
cc_sectors_1_pt2.nii.gz
34 408.000000 0.582149 0.165659 0.250764 0.783992
cc_sectors_2_pt2.nii.gz
10 120.000000 0.515655 0.140343 0.329384 0.711448
cc_sectors_3_pt2.nii.gz
11 132.000000 0.552913 0.178691 0.266971 0.775907
cc_sectors_4_pt2.nii.gz
15 180.000000 0.522139 0.174769 0.255384 0.728975
cc_sectors_5_pt2.nii.gz
41 492.000000 0.511493 0.173014 0.220017 0.720323

BBSubjectname
cc_sectors_1_pt2.nii.gz
124 4308.000000 0.582149 0.165659 0.250764 0.783992
cc_sectors_2_pt2.nii.gz
123 2320.000000 0.515655 0.140343 0.329384 0.711448
cc_sectors_3_pt2.nii.gz
112 1232.000000 0.552913 0.178691 0.266971 0.775907
cc_sectors_4_pt2.nii.gz
15 180.000000 0.522139 0.174769 0.255384 0.728975
cc_sectors_5_pt2.nii.gz
41 492.000000 0.511493 0.173014 0.220017 0.720323



Thanks again, you are a lifesaver.
 
Change the last 3 lines:

[tt]do
cat $myfile >> $file
done[/tt]

into these four:

[tt]do
echo $myfile
sort $myfile
done > $file[/tt]

or if you want each line of every file preceded with the corresponding filename:

[tt]do
sort $myfile|sed "s!^!$myfile !"
done > $file[/tt]


HTH,

p5wizard
 
Thank you all for your responses!

p5wizard. This is helpful but this command is reorganizing each subjects numbered output rather than organizing by each subject and leaving each subjects data in the order it originally was.



Here are the first two subjects output. As you can see it is putting the numbers in order(which I do not want) and failing to order the subjects (a-z).


./zash_CC_data
13 156.000000 0.505853 0.129748 0.262874 0.699645
13 156.000000 0.538307 0.232008 0.255260 0.797412
18 216.000000 0.561296 0.191070 0.237020 0.799556
22 264.000000 0.575743 0.195684 0.209284 0.834430
28 336.000000 0.567538 0.114532 0.309250 0.744849
cc_sectors_1_pt2.nii.gz
cc_sectors_2_pt2.nii.gz
cc_sectors_3_pt2.nii.gz
cc_sectors_4_pt2.nii.gz
cc_sectors_5_pt2.nii.gz
./erbe_CC_data_2
12 144.000000 0.616767 0.168490 0.284263 0.786699
14 168.000000 0.521187 0.166490 0.212461 0.694295
15 180.000000 0.585272 0.120794 0.378161 0.748938
24 288.000000 0.607249 0.133945 0.271707 0.769484
37 444.000000 0.558194 0.177269 0.200309 0.810863
cc_sectors_1_2_pt2.nii.gz
cc_sectors_2_2_pt2.nii.gz
cc_sectors_3_2_pt2.nii.gz
cc_sectors_4_2_pt2.nii.gz
cc_sectors_5_2_pt2.nii.gz



Here is my current code:


rm -f $file
find . -type f -print|xargs grep -l cc_sectors_1| while read myfile
do
echo $myfile
sort $myfile
done > $file


Any suggestions?
 
so, sort the list of found files and just cat the files instead of finding the files and sorting their contents.

[tt]find . -type f -print|xargs grep -l cc_sectors_1|[red]sort|[/red]while read myfile
do
echo $myfile
[red]cat[/red] $myfile
done > $file[/tt]



HTH,

p5wizard
 
This worked. Thank you so much, altruism does exist.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top