join columns from many files with the same structure using awk 2

babcia01 · Jul 26, 2002

I have x number of files which are named using constistent naming convention mmdd.file (where mm is a month number and dd is a day number, "file" is a constant.

What is the best way to approach the following:

I will explain it by using this example:

0723.file contains the following tab delimited columns:

row1_col1_in_0723.file row1_col2_in_0723.file
row2_col1_in_0723.file row2_col2_in_0723.file
:

0724.file contains the following tab delimited columns:
row1_col1_in_0724.file row1_col2_in_0724.file
row2_col1_in_0724.file row2_col2_in_0724.file
:

In my output file, I want to have the following:
row1_col2_in_0723.file row1_col2_in_0724.file
row2_col2_in_0723.file row2_col2_in_0724.file
:
Thank you very much

CaKiwi · Jul 26, 2002

Try this script

Code:

{
  printf $2 &quot; &quot;
  getline < &quot;-&quot;
  print $2
}

Put it in a file, join.awk say, and enter

awk -f join.awk 0723.file < 0724.file > new.file CaKiwi

CaKiwi · Jul 26, 2002

Upon further review I see that you want to do this for many files, not just 2. The best way I can think of to do this is to use awk (or cut) to create a set of files with just the second column and then use paste to concatenate them. CaKiwi

babcia01 · Jul 26, 2002

Thank you very much for your quick response.
Before I asked my original question, I developed something like this, which is working:

for i in `ls *.file`
{
sed 1d "$i"|awk '{print $2}' > "$i".new
}
paste *.new

I also, needed to delete first line. I thought that this is not too efficient and "nice" looking and creates unnecessary files.
I though that may be there is some other way with getline/nextfile.
I am glad that I concluded the same solution, which you suggested, although, before I did it, I spent too much time until I found this useful "paste" command.
Thank you once again for your suggestions.

dchoulette · Jul 29, 2002

Just read all files in awk and append the field 2 of each line to an array element whose index is the number of the line in the file (FNR = current File Number of Record). To get rid of the first line of each file just do not print the array element whose index is 1. The

Code:

substr

is use to remove the " " char added at start of each line by the concatenation command (

Code:

string[FNR] &quot; &quot; $2

).

Code:

awk '
{
  string[FNR] = string[FNR] &quot; &quot; $2;
  if (maxFNR < FNR) maxFNR = FNR;
}
END {
  for (i = 2; i <= maxFNR; i++)
    print substr(string[i], 2);
}' *.file

babcia01 · Jul 29, 2002

Thanks a lot for the 100 % correct and working solution.
What a great site!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

join columns from many files with the same structure using awk 2

babcia01

IS-IT--Management

CaKiwi

Programmer

CaKiwi

Programmer

babcia01

IS-IT--Management

dchoulette

Programmer

babcia01

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor