fields in array

Guest_imported · Mar 18, 2002

Hi,

Given the following file:

a,b,c,d
a,r,f,g
j,l,m,p
z,h,q,l
(,f,%,l
s,b,f,l

I need to write a script that takes all the elements in the different fields and puts them horizontally. Each element can occur only once. This is how the outputfile should eventually look like:

(a,j,z,(,s) # these are all the elements of $1
(b,r,l,h,f) # all the different elements of $2
(c,f,m,q,%) # all the unique elements of $3
(d,g,p) # all the different of $4

So, to start with:
- look at the element in the first field;
- if element not in array, put element in array,
- else continue
- when AWK reaches the end of the file, all the different
elements should be printed horizontally, seperated by
comma's

I've added a "(" and a "%" to the example to indicate that the elements in the fields are not always letters: it cold be punctuation marks, letters, words, numbers, etc. So basically: /.*/

Can someone help me with this?

vgersh99 · Mar 18, 2002

BEGIN {
#------------------------------------------
# Change he field separator to "|" or as an
# alternate use the -F option.
#------------------------------------------
FS = ","
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
split("", row, SUBSEP);
while (j <= max_nf)
{
i = 1
while (i < NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf ("%s%s", arr[i,j],FS);
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}

vgersh99 · Mar 18, 2002

a bit better version:

#------------------------------------------------
BEGIN {
#------------------------------------------
# Change he field separator to "|" or as an
# alternate use the -F option.
#------------------------------------------
FS = ","
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
split("", row, SUBSEP);
while (j <= max_nf)
{
i = 1
while (i < NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf("%s%s", (i != 1 ) ? FS : "", arr[i,j]
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}

Guest_imported · Mar 21, 2002

Hi vgersh,

I've tried both your proposals on the following testfile:

a,b,c,d
=,f,c,r
/,[,PUNC,g
climb,%,r,d
a,2,z,d

the output should be:

a,=,/,climb
b,f,[,%,2
c,PUNC,r,z
d,r,g

But with your two scripts, the output becomes:

a,=,/,climb,a,2,z,d
b,f,[,%,a,2,z,d
c,PUNC,r,a,2,z,d
d,g,a,2,z,d

The last line of the file (a,2,z,d) is always added to the end of each line in the outputfile. And second problem, which is more important, is the fact that in the last line of the outputfile, the element "r" is missing. Can you find a solution for these problems?

Thanks!

vgersh99 · Mar 21, 2002

Hi,
Input:
a,b,c,d
=,f,c,r
/,[,PUNC,g
climb,%,r,d
a,2,z,d

Ouput:
a,=,/,climb
b,f,[,%,2
c,PUNC,r,z
d,g

Given your initial statement "Each element can occur only once", this exactly the ouput desired from the original post.

Pls try this version of the script and us know.

vlad

BEGIN {
#------------------------------------------
# Change he field separator to "|" or as an
# alternate use the -F option.
#------------------------------------------
FS = ","
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
split("", row, SUBSEP);
while (j <= max_nf)
{
i = 1
while (i <= NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf("%s%s", (i != 1 ) ? FS : "", arr[i,j]);
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}

Guest_imported · Mar 21, 2002

Hi vgersh,

Yeah, there has been some misunderstanding. With the following input:

a,b,c,d
=,f,c,r
/,[,PUNC,g
climb,%,r,d
a,2,z,d

I should definitely get the following output:

a,=,/,climb
b,f,[,%,2
c,PUNC,r,z
d,r,g

An element may in fact occur more than once but NOT in the same FIELD. In the first field of the inputfile, we see two times the element 'a'. One of them should be deleted. The fourth field has three times d. So two of them should be deleted. But the element 'r' should be remained since it occurs only once, when looking at the fourth field of each line.

I thought I had made this clear by the example above. Sorry for the confusion. Does this mean that for every field a different array should be created? Or is there a much simpler solution 'cause in the original files, each line contains 15 fields!

Thanks.

vgersh99 · Mar 22, 2002

Your definition of the "field" is somewhat misleading. By the "field" you actually mean a "transposed row", i.e. "a column" from the input that became "a row" on the output side.

AWK's definition of the a "field" is quite different.

Here's a slightly modified version of a script.

BEGIN {
#------------------------------------------
# Change he field separator to "|" or as an
# alternate use the -F option.
#------------------------------------------
FS = ","
}
{
#------------------------------------------------
# Store the field data in a two dimensional array
# dimensioned by line # and field #. Also, save
# the max # of fields for use in output.
#------------------------------------------------
i = 1
while (i <= NF ) {
arr[NR,i] = $i;
i++
}
if (NF > max_nf)
max_nf = NF
}

END {
#---------------------------------------------
# Print out the two dimensional array but
# change the access so that the original lines
# become the fields and the fields become the
# lines.
#---------------------------------------------
j = 1
while (j <= max_nf)
{
i = 1
split("", row, SUBSEP);
while (i <= NR)
{
if ( arr[i,j] in row ) { i++}
else {
printf("%s%s", (i != 1 ) ? FS : "", arr[i,j]);
row[arr[i,j]]++;
i++
}
}
print
# print arr[i,j]
j++
}
}

gregor weertman · Mar 22, 2002

I hope this does the job for you.

awk -F"," '{
split( $0, arr, ",&quot

for( ii = 1; ii <= NF; ii++) {
arr2[ NR, ii] = arr[ ii]
}
}

END {
for( jj = 1; jj <= 4; jj++){
ln= "," arr2[ 1, jj] ","
for( ii = 2; ii <= NR; ii++){
if( index( ln, "," arr2[ ii, jj] ",&quot

== 0){
ln = ln arr2[ ii, jj] ","
}
}
print "(" substr( ln, 2, length( ln) - 2) &quot

"
}
}' inputfile

Regards Gregor Gregor.Weertman@mailcity.com

Guest_imported · Mar 24, 2002

Thanks, it works!

zlangu

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

fields in array

Guest_imported

New member

vgersh99

Programmer

vgersh99

Programmer

Guest_imported

New member

vgersh99

Programmer

Guest_imported

New member

vgersh99

Programmer

gregor weertman

Programmer

Guest_imported

New member

Similar threads

Part and Inventory Search

Sponsor