mmmdoughnuts
Technical User
Hi!
I'm not very experienced at programming but I wrote a script to manipulate two sets of data into a third file, but it's ugly and probably not very efficient. Here is the problem: Suppose I have two files, a.txt and b.txt; a.txt contains five columns (separated by spaces) and b.txt contains 13 columns (separated by commas). a.txt will always have fewer lines than b.txt and the values in column 1 of a.txt can be found in column 1 of b.txt. For example:
a.txt:
1111 asdf asdf asdf asdf
3333 asdf asdf asdf afsf
7777 assa asdf asdf skfa
b.txt
1111,2,3,4,5,6,7,8,9,10,11,12,1111
2222,2,3,4,5,6,7,8,9,10,11,12,1234
3333,2,3,4,5,6,7,8,9,10,11,12,3333
4444,2,3,4,5,6,7,8,9,10,11,12,4444
5555,2,3,4,5,6,7,8,9,10,11,12,5555
6666,2,3,4,5,6,7,8,9,10,11,12,6666
7777,2,3,4,5,6,7,8,9,10,11,12,5678
8888,2,3,4,5,6,7,8,9,10,11,12,8888
The important thing to note is that 1111 and 3333 are the same between column 1 in a.txt and column 1 and 13 in b.txt, but 7777 has a different value for column 13 in b.txt (the rest of the data in both files is irrelevant to the manipulation I need to do). I need to merge these two files such that the value from column 13 in b.txt replaces the value in column 1 in a.txt, like so:
1111 asdf asdf asdf asdf
3333 asdf asdf asdf afsf
5678 assa asdf asdf skfa
So, I wrote the following script:
awk '{printf "%s\n",$1}' a.txt > a.column1.txt
for X in `cat a.column1.txt`
do
grep $X b.txt | /bin/awk -F, '{printf "%s\n",$13}' | sed 's/"//g' >> b.column13.txt
done
awk '{printf "%s %s %s %s\n",$2,$3,$4,$5}' a.txt > a.cols2to5.txt
sdiff b.column13.txt a.cols2to5.txt | sed 's/|//g' > ab.merged
As you can tell from the way I explained the problem and the way I wrote the script, I'm new to scripting
However, I don't have to be an expert to see that there's probably a much better way to do this. Any feedback would be greatly appreciated.
I'm not very experienced at programming but I wrote a script to manipulate two sets of data into a third file, but it's ugly and probably not very efficient. Here is the problem: Suppose I have two files, a.txt and b.txt; a.txt contains five columns (separated by spaces) and b.txt contains 13 columns (separated by commas). a.txt will always have fewer lines than b.txt and the values in column 1 of a.txt can be found in column 1 of b.txt. For example:
a.txt:
1111 asdf asdf asdf asdf
3333 asdf asdf asdf afsf
7777 assa asdf asdf skfa
b.txt
1111,2,3,4,5,6,7,8,9,10,11,12,1111
2222,2,3,4,5,6,7,8,9,10,11,12,1234
3333,2,3,4,5,6,7,8,9,10,11,12,3333
4444,2,3,4,5,6,7,8,9,10,11,12,4444
5555,2,3,4,5,6,7,8,9,10,11,12,5555
6666,2,3,4,5,6,7,8,9,10,11,12,6666
7777,2,3,4,5,6,7,8,9,10,11,12,5678
8888,2,3,4,5,6,7,8,9,10,11,12,8888
The important thing to note is that 1111 and 3333 are the same between column 1 in a.txt and column 1 and 13 in b.txt, but 7777 has a different value for column 13 in b.txt (the rest of the data in both files is irrelevant to the manipulation I need to do). I need to merge these two files such that the value from column 13 in b.txt replaces the value in column 1 in a.txt, like so:
1111 asdf asdf asdf asdf
3333 asdf asdf asdf afsf
5678 assa asdf asdf skfa
So, I wrote the following script:
awk '{printf "%s\n",$1}' a.txt > a.column1.txt
for X in `cat a.column1.txt`
do
grep $X b.txt | /bin/awk -F, '{printf "%s\n",$13}' | sed 's/"//g' >> b.column13.txt
done
awk '{printf "%s %s %s %s\n",$2,$3,$4,$5}' a.txt > a.cols2to5.txt
sdiff b.column13.txt a.cols2to5.txt | sed 's/|//g' > ab.merged
As you can tell from the way I explained the problem and the way I wrote the script, I'm new to scripting
However, I don't have to be an expert to see that there's probably a much better way to do this. Any feedback would be greatly appreciated.