Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

fields in array

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hi,

Given the following file:

a,b,c,d
a,r,f,g
j,l,m,p
z,h,q,l
(,f,%,l
s,b,f,l

I need to write a script that takes all the elements in the different fields and puts them horizontally. Each element can occur only once. This is how the outputfile should eventually look like:

(a,j,z,(,s) # these are all the elements of $1
(b,r,l,h,f) # all the different elements of $2
(c,f,m,q,%) # all the unique elements of $3
(d,g,p) # all the different of $4

So, to start with:
- look at the element in the first field;
- if element not in array, put element in array,
- else continue
- when AWK reaches the end of the file, all the different
elements should be printed horizontally, seperated by
comma's

I've added a "(" and a "%" to the example to indicate that the elements in the fields are not always letters: it cold be punctuation marks, letters, words, numbers, etc. So basically: /.*/ :)

Can someone help me with this?

 
BEGIN {
#------------------------------------------
# Change he field separator to "|" or as an
# alternate use the -F option.
#------------------------------------------
FS = ","
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
split(&quot;&quot;, row, SUBSEP);
while (j <= max_nf)
{
i = 1
while (i < NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf (&quot;%s%s&quot;, arr[i,j],FS);
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}
 
a bit better version:

#------------------------------------------------
BEGIN {
#------------------------------------------
# Change he field separator to &quot;|&quot; or as an
# alternate use the -F option.
#------------------------------------------
FS = &quot;,&quot;
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
split(&quot;&quot;, row, SUBSEP);
while (j <= max_nf)
{
i = 1
while (i < NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf(&quot;%s%s&quot;, (i != 1 ) ? FS : &quot;&quot;, arr[i,j]
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}
 
Hi vgersh,

I've tried both your proposals on the following testfile:

a,b,c,d
=,f,c,r
/,[,PUNC,g
climb,%,r,d
a,2,z,d

the output should be:

a,=,/,climb
b,f,[,%,2
c,PUNC,r,z
d,r,g

But with your two scripts, the output becomes:

a,=,/,climb,a,2,z,d
b,f,[,%,a,2,z,d
c,PUNC,r,a,2,z,d
d,g,a,2,z,d

The last line of the file (a,2,z,d) is always added to the end of each line in the outputfile. And second problem, which is more important, is the fact that in the last line of the outputfile, the element &quot;r&quot; is missing. Can you find a solution for these problems?

Thanks!
 
Hi,
Input:
a,b,c,d
=,f,c,r
/,[,PUNC,g
climb,%,r,d
a,2,z,d

Ouput:
a,=,/,climb
b,f,[,%,2
c,PUNC,r,z
d,g


Given your initial statement &quot;Each element can occur only once&quot;, this exactly the ouput desired from the original post.

Pls try this version of the script and us know.

vlad

BEGIN {
#------------------------------------------
# Change he field separator to &quot;|&quot; or as an
# alternate use the -F option.
#------------------------------------------
FS = &quot;,&quot;
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
split(&quot;&quot;, row, SUBSEP);
while (j <= max_nf)
{
i = 1
while (i <= NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf(&quot;%s%s&quot;, (i != 1 ) ? FS : &quot;&quot;, arr[i,j]);
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}

 
Hi vgersh,

Yeah, there has been some misunderstanding. With the following input:

a,b,c,d
=,f,c,r
/,[,PUNC,g
climb,%,r,d
a,2,z,d

I should definitely get the following output:

a,=,/,climb
b,f,[,%,2
c,PUNC,r,z
d,r,g

An element may in fact occur more than once but NOT in the same FIELD. In the first field of the inputfile, we see two times the element 'a'. One of them should be deleted. The fourth field has three times d. So two of them should be deleted. But the element 'r' should be remained since it occurs only once, when looking at the fourth field of each line.

I thought I had made this clear by the example above. Sorry for the confusion. Does this mean that for every field a different array should be created? Or is there a much simpler solution 'cause in the original files, each line contains 15 fields!

Thanks.



 
Your definition of the &quot;field&quot; is somewhat misleading. By the &quot;field&quot; you actually mean a &quot;transposed row&quot;, i.e. &quot;a column&quot; from the input that became &quot;a row&quot; on the output side.

AWK's definition of the a &quot;field&quot; is quite different.

Here's a slightly modified version of a script.

BEGIN {
#------------------------------------------
# Change he field separator to &quot;|&quot; or as an
# alternate use the -F option.
#------------------------------------------
FS = &quot;,&quot;
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
while (j <= max_nf)
{
i = 1
split(&quot;&quot;, row, SUBSEP);
while (i <= NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf(&quot;%s%s&quot;, (i != 1 ) ? FS : &quot;&quot;, arr[i,j]);
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}
 
I hope this does the job for you.

awk -F&quot;,&quot; '{
split( $0, arr, &quot;,&quot;)
for( ii = 1; ii <= NF; ii++) {
arr2[ NR, ii] = arr[ ii]
}
}

END {
for( jj = 1; jj <= 4; jj++){
ln= &quot;,&quot; arr2[ 1, jj] &quot;,&quot;
for( ii = 2; ii <= NR; ii++){
if( index( ln, &quot;,&quot; arr2[ ii, jj] &quot;,&quot;) == 0){
ln = ln arr2[ ii, jj] &quot;,&quot;
}
}
print &quot;(&quot; substr( ln, 2, length( ln) - 2) &quot;)&quot;
}
}' inputfile


Regards Gregor Gregor.Weertman@mailcity.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top