Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merge two files (newbie) 3

Status
Not open for further replies.

BrianWilson

Programmer
Feb 23, 2003
17
0
0
GB
Hi all,

Sorry for what may be a newbie question, but I've read what I can on awk and nawk and can't fathom this out. Any help would be greatly appreciated.

I have two files. "columns.cfg" is a config file and "data.txt" is the data.

columns.cfg (one single line)
A,B,C,D,E,...etc.

data.txt
1,2,3,4,5
11,12,13,14,15...etc

What I need to do is have a third output file of the following format:
A,NR,B,1,C,2,D,3,E,4
A,NR,B,11,C,12,D13,E14....etc.

In essence, apart from the first config column, which will be the row number in the file, the column in column.cfg should be paired with the "position less one" element in data.txt

Hope this is clear. Please can you assist?
 
Something like this ?
awk -F, '
BEGIN{getline<"columns.cfg";nf=NF;for(i=0;i<nf;++i)c=$i}
{printf "%s,%d",c[0],NR
for(i=1;i<nf;++i)printf ",%s,%s",c,$i
printf "\n"
}' data.txt

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
PHV,

Wow! Thanks for the quick reply. It's much appreciated. I shall get my head around what is going on so I can underatand, and then see if it's OK in production.

Hold tight....there could be a star on the way!...
 
PHV,
I've had a try with this and get varying messages. Hope you can still assist with this issue.

I'm using variable substitution for the filename so have:

//// START CODE SNIPPET

COLUMNS=$Root/conf/columns.cfg

if [ -s $COLUMNS ]
then
awk -F, '
BEGIN {getline<$COLUMNS;nf=NF;for(i=0;i<nf;++i) c=$i}
{ printf "%s,%d",c[0],NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
}' $DATADIR/data.txt > $DATADIR/data_out.txt
else
echo `date` : $COLUMNS does not exist | tee -a $LOG
fi

//// END CODE SNIPPET

I get the following output on execution:

+ awk -F,
BEGIN {getline<$COLUMNS;nf=NF;for(i=0;i<nf;++i) c=$i}
{ printf "%s,%d",c[0],NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
} /opt/data/data.txt
+ 1> /opt/data/data_out.txt
awk: syntax error near line 2
awk: illegal statement near line 2

So, tried using nawk (replacing the printf's for prints) but then get:

>>nawk: illegal field $()
>> source line number 2

I've looked over your code and can't see anything wrong? Any suggestions??
 
Try this...

awk '
BEGIN {
OFS = FS = ",";
getline < ENVIRON["COLUMNS"];
split($0, c);
}
{
for(i=1; i<=NF; i++) {
$i = (i==1 ? c[1] OFS NR OFS : "") c[i+1] OFS $i;
}
print $0;
}
' data.txt
 
P.S. You might want to use a different name for the environment variable since $COLUMNS is generally used by the shell to set the width of the terminal window.
 
Ygor,

I tried your code suggestion (and changed the var name to CFG) but now I get:

>>>>
+ awk
BEGIN {
OFS=FS=",";
getline < ENVIRON["CFG"];
split($0,c);
}
{
for(i=1;i<NF;i++){
$i=(i==1 ? c[1] OFS NR OFS : "") c[i+1] OFS $i;
}
print $0;
} /opt/data/data_20040625.txt
+ 1> /opt/data/data_export_20040625.txt

awk: syntax error near line 4
awk: illegal statement near line 4
awk: syntax error near line 9
awk: illegal statement near line 9

<<<<<<<<<<

I am using the Korn shell? Would that cause any issues that you would be aware of?

Many, many thanks for your time...
Brian
 
I agree with Ygor about COLUMNS.
Have you tried this ?
Columns=$Root/conf/columns.cfg
if [ -s $Columns]; then
awk -F, "
BEGIN {getline<$Columns"';nf=NF;for(i=0;i<nf;++i) c=$(i+1)}
{ printf "%s,%d",c[0],NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
}' $DATADIR/data.txt > $DATADIR/data_out.txt
else
echo `date` : $Columns does not exist | tee -a $LOG
fi

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
PHV,

Done as you have suggested regarding variable name.

Output is now
>>>>>>>
+ awk -F,
BEGIN {getline</opt/test/conf/columns.cfg;nf=NF;for(i=0;i<nf;++i) c=$(i+1)}
{ printf "%s,%d",c[0],NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
} /opt/test/data/data_out.txt
+ 1> /opt/test/data/data_out.txt
awk: syntax error near line 2
awk: illegal statement near line 2

<<<<<<

Line 2 I presume is the BEGIN... line?? In which case is there a nice easy debugging routine I could use to test getline (i.e. input and output to file) - all other code on that line looks fine to me.

Thanks again chaps for your valuable time.
 
My bad.
Columns=$Root/conf/columns.cfg
if [ -s $Columns]; then
awk -F, '
BEGIN {getline<"'$Columns'";nf=NF;for(i=0;i<nf;++i) c=$(i+1)}
{ printf "%s,%d",c[0],NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
}' $DATADIR/data.txt > $DATADIR/data_out.txt
else
echo `date` : $Columns does not exist | tee -a $LOG
fi

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
PHV,

Sorry do NOT want this to be a burden to you! Your help is very much appreciated.

Sadly, I still get
syntax error at line 2 and
illegal statement near line 2.

I've tried the following combos:

BEGIN {getline<"`$Columns`";nf=NF;for(i=0;i<nf;++i) c=$(i+1)}

BEGIN {getline<"'$Columns'";nf=NF;for(i=0;i<nf;++i) c=$(i+1)}

Infact on the output now, this does not resolve the environement var, but instead displays "$Columns"

I have tried hard coding the cfg filename to the getline statement, but that returns the same error as well.
 
how about this:

Code:
Columns=$Root/conf/columns.cfg
if [ -s $Columns]; then
  awk -F, -v Columns="${Columns}" '
    BEGIN {getline< Columns ; .....

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
vgersh99

Hi! Thanks for your input on this matter. I have tried your suggestion, however, instead of an illegal statement message, it tells me it's bailing out. Here's the output:

>>>>>>>>

+ [ -s /opt/test/conf/glexport.cfg ]
+ awk -F, -v Cols=/opt/test/conf/glexport.cfg
BEGIN {getline<Cols;nf=NF;for(i=0;i<nf;++i) c=$(i+1)}
{ printf "%s,%s",c[0],$fDATE-NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
} /opt/test/data/data.txt
+ 1> /opt/test/data/data_out.txt
awk: syntax error near line 1
awk: bailing out near line 1

<<<<<<

Again I have tried hardcoding the path, but no joy...
 
if on Solaris, use nawk instead of awk.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Columns=$Root/conf/columns.cfg
if [ -s $Columns]; then
awk -F, [highlight]'[/highlight]
BEGIN {getline<"'$Columns'";nf=NF;for(i=0;i<nf;++i) c=$(i+1)}
{ printf "%s,%d",c[0],NR
for(i=1;i<nf;++i) printf ",%s,%s",c,$i
printf "\n"
}' $DATADIR/data.txt > $DATADIR/data_out.txt
else
echo `date` : $Columns does not exist | tee -a $LOG
fi

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
vgersh & PHV,

Yep, Solaris it is (we do have some awk scripts on the box, but nothing strangely as seemingly complicated as this!)

I am still getting minor little errors, but I should be able to take it from here.

Thanks once again for your time and assitance. A star to you both.

Regards,
Brian
 
For the benefit of members, please, don't forgot to post your working error free solution (and thanks for the pinky).
 
PHV,

Sure thing...not working exactly as I require it yet, but the code here should be enough for other codies to work with.

Code:
#!/bin/ksh
# [b]NOTE: THIS CODE IS CONDENSED & SENSITIVE INFORMATION REMOVED/RENAMED.[/b]
#------------initialise variables----------
set -x
fDATE=`date +%Y%m%d`
Root=/opt/test
DATADIR=$Root/data
LOGDIR=$Root/log
LOG=$LOGDIR/export.$fDATE.out
Columns=$Root/conf/columns.cfg

echo `date` : format begins | tee $LOG

if [ -s $Columns ];
then
        nawk -F, -v Cols="${Columns}" '
        BEGIN {getline<Cols;nf=NF;for(i=0;i<nf;++i) c[i]=$(i+1)}
                { print c[0]","NR
                for(i=1;i<nf;++i) print c[i]","$i
                print "\n"
        }' $DATADIR/date_$fDATE.txt > $DATADIR/data_out_$fDATE.txt
else
        echo  --------------------------------| tee -a $LOG
        echo `date` : $Columns does not exist | tee -a $LOG
        echo  --------------------------------| tee -a $LOG
fi

echo `date` : format ends | tee -a $LOG

File Contents:
columns.cfg
HEADER,first,second,third,fourth

data.txt
Cat,Dog,Cow,Sheep
Apple,Banana,Peach,Grape
Blue,Yellow,Red,Green

OUTPUT
HEADER,1
first,Cat
second,Dog
third,Cow
fourth,Sheep

HEADER,2
first,Apple
second,Banana
third,Peach
fourth,Grape

HEADER,3
first,Blue
secondYellow
third,Red
fourth,Green
 
All,

An update to this issue I had. I have amalgamated the posts from PHV, vgersh99 and Ygor.

To recap:
Two files, one a configuration file, and one data:

File Contents:
columns.cfg
HEADER,first,second,third,fourth

data.txt
Cat,Dog,Cow,Sheep
Apple,Banana,Peach,Grape
Blue,Yellow,Red,Green

The idea is to amalagamate the two files, but with a one column offset showing the row number.


Code:
#!/bin/ksh
# NOTE: THIS CODE IS CONDENSED & SENSITIVE INFORMATION REMOVED/RENAMED.
#------------initialise variables----------
set -x
fDATE=`date +%Y%m%d`
Root=/opt/test
DATADIR=$Root/data
LOGDIR=$Root/log
LOG=$LOGDIR/export.$fDATE.out
Columns=$Root/conf/columns.cfg
echo `date` : formatGLExport begins | tee $LOG

if [ -s $Columns ];
then
        nawk -v Cols="${Columns}" '
        BEGIN {
                OFS=FS=",";
                getline< Cols;
                split($0,c);
                }
                {
                for(i=0;i<NF;++i) {
                $i=(i==1 ? c[1] OFS NR : c[i] OFS $i) ;
                }
                print $0;
        }' $DATADIR/date_$fDATE.txt > $DATADIR/data_out_$fDATE.txt

else
        echo  --------------------------------| tee -a $LOG
        echo `date` : $Columns does not exist | tee -a $LOG
        echo  --------------------------------| tee -a $LOG
fi

OUTPUT:
HEADER,1,first,Cat,second,Dog,third,Cow,fourth,Sheep
HEADER,2,first,Apple,second,Banana,third,Peach,fourth,Grape
HEADER,3,first,Blue,second,Yellow,third,Red,fourth,Green
[/color blue]

Thanks to all mentioned once again.
Regards
BW
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top