Multiple line record problem 1

learningawk · Jan 7, 2003

I am trying to break down an address file that consists of 3 address's per group and then this group of 3 are separated by 2 blank lines or newline character.
I want to print each separate address of 4 lines in a tab or comma delimited output ( one complete address per record)

Here's a dummy input file with a column counter:

1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA XXXXXX EEEEEEEEEEEEEEEEE xxxxxx IIIIIIIIIIII xxxxx
BBBBBBBBBBBBBBBBBBBBBB FFFFFFFFFFFFFFFFFFFFF JJJJJJJJJJJJJ
CCCCCCCCCCCCCCCCCCCCCCCCCC GGGGGGGGGGGGGGGGGGGGGGGGGG KKKKKKKKKKKKKKKK
(000) 0000-0000 CORPORATION (111) 1111-1111 SOLE PROPRIETORSHIP (222) 2222-2222 SOLE PROPRIETORSHIP

lllllllllllllllllllllllllllll XXXXXX ooooooooooooooooo xxxxxx ssssssssssss xxxxx
mmmmmmmmmmmmmmmmmmmmmm ppppppppppppppppppppp tttttttttttttttt
nnnnnnnnnnnnnnnnnnnnnnnnnn qqqqqqqqqqqqqqqqqqqqqqqqqq uuuuuuuuuuuuuu
(333-333-3333 CORPORATION (444) 4444-4444 SOLE PROPRIETORSHIP (555)- 5555-5555 SOLE PROPRIETORSHIP

This format will make more sense if you cut/paste into a text editor so lines will not wrap as it is shown here.

Here's my preliminary code to break this down:

BEGIN {RS==""
OFS="\t"
}
{
row1col1=substr($0,1,33);
#row1col2=substr($0,45,33);
#row1col3=substr($0,89,33);
getline
row2col1=substr($0,1,33);
#row2col2=substr($0,45,33);
#add2col3=substr($0,89,33);
getline
row3col1=substr($0,1,33);
#row3col2=substr($0,45,33);
#row3col3=substr($0,89,33);
getline
row4col1=substr($0,1,33);
#row4col2=substr($0,45,33);
#row4col3=substr($0,89,33);
}

{
#gsub(/ /, "",row1col1); #to remove extra spaces in fields
#gsub(/ /, "",row2col1); #to remove extra spaces in fields
#gsub(/ /, "",row3col1); #to remove extra spaces in fields
#gsub(/ /, "",row4col1); #to remove extra spaces in fields
print row1col1,row2col1,row3col1,row4col1
#printf ("%s%s%s%s",row1col1,row2col1,row3col1,row4col1)
}

I have been testing on just the first column of address, but it gets buggy when the output inserts a newline before the first phone number.

QUESTION:How do you handle a record separator of 2 blank lines?

I am open to different approaches in breaking this down.
Thanks so much for any assistance.

CaKiwi · Jan 8, 2003

I would use the same approach as you have. Remove the RS="" and add getline twice to skip past the blank lines.

BEGIN {#RS=""
OFS="\t"
}
{
row1col1=substr($0,1,33);
row1col2=substr($0,45,33);
row1col3=substr($0,89,33);
getline
row2col1=substr($0,1,33);
row2col2=substr($0,45,33);
add2col3=substr($0,89,33);
getline
row3col1=substr($0,1,33);
row3col2=substr($0,45,33);
row3col3=substr($0,89,33);
getline
row4col1=substr($0,1,33);
row4col2=substr($0,45,33);
row4col3=substr($0,89,33);

#gsub(/ /, "",row1col1); #to remove extra
#gsub(/ /, "",row2col1); #to remove extra
#gsub(/ /, "",row3col1); #to remove extra
#gsub(/ /, "",row4col1); #to remove extra
print row1col1,row2col1,row3col1,row4col1
print row1col2,row2col2,row3col2,row4col2
print row1col3,row2col3,row3col3,row4col3
getline
getline
} CaKiwi

learningawk · Jan 8, 2003

Thanks CaKiwi for the help on this problem.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Multiple line record problem 1

learningawk

Technical User

CaKiwi

Programmer

learningawk

Technical User

Similar threads

Part and Inventory Search

Sponsor