learningawk
Technical User
I am trying to break down an address file that consists of 3 address's per group and then this group of 3 are separated by 2 blank lines or newline character.
I want to print each separate address of 4 lines in a tab or comma delimited output ( one complete address per record)
Here's a dummy input file with a column counter:
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA XXXXXX EEEEEEEEEEEEEEEEE xxxxxx IIIIIIIIIIII xxxxx
BBBBBBBBBBBBBBBBBBBBBB FFFFFFFFFFFFFFFFFFFFF JJJJJJJJJJJJJ
CCCCCCCCCCCCCCCCCCCCCCCCCC GGGGGGGGGGGGGGGGGGGGGGGGGG KKKKKKKKKKKKKKKK
(000) 0000-0000 CORPORATION (111) 1111-1111 SOLE PROPRIETORSHIP (222) 2222-2222 SOLE PROPRIETORSHIP
lllllllllllllllllllllllllllll XXXXXX ooooooooooooooooo xxxxxx ssssssssssss xxxxx
mmmmmmmmmmmmmmmmmmmmmm ppppppppppppppppppppp tttttttttttttttt
nnnnnnnnnnnnnnnnnnnnnnnnnn qqqqqqqqqqqqqqqqqqqqqqqqqq uuuuuuuuuuuuuu
(333-333-3333 CORPORATION (444) 4444-4444 SOLE PROPRIETORSHIP (555)- 5555-5555 SOLE PROPRIETORSHIP
This format will make more sense if you cut/paste into a text editor so lines will not wrap as it is shown here.
Here's my preliminary code to break this down:
BEGIN {RS==""
OFS="\t"
}
{
row1col1=substr($0,1,33);
#row1col2=substr($0,45,33);
#row1col3=substr($0,89,33);
getline
row2col1=substr($0,1,33);
#row2col2=substr($0,45,33);
#add2col3=substr($0,89,33);
getline
row3col1=substr($0,1,33);
#row3col2=substr($0,45,33);
#row3col3=substr($0,89,33);
getline
row4col1=substr($0,1,33);
#row4col2=substr($0,45,33);
#row4col3=substr($0,89,33);
}
{
#gsub(/ /, "",row1col1); #to remove extra spaces in fields
#gsub(/ /, "",row2col1); #to remove extra spaces in fields
#gsub(/ /, "",row3col1); #to remove extra spaces in fields
#gsub(/ /, "",row4col1); #to remove extra spaces in fields
print row1col1,row2col1,row3col1,row4col1
#printf ("%s%s%s%s",row1col1,row2col1,row3col1,row4col1)
}
I have been testing on just the first column of address, but it gets buggy when the output inserts a newline before the first phone number.
QUESTION:How do you handle a record separator of 2 blank lines?
I am open to different approaches in breaking this down.
Thanks so much for any assistance.
I want to print each separate address of 4 lines in a tab or comma delimited output ( one complete address per record)
Here's a dummy input file with a column counter:
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA XXXXXX EEEEEEEEEEEEEEEEE xxxxxx IIIIIIIIIIII xxxxx
BBBBBBBBBBBBBBBBBBBBBB FFFFFFFFFFFFFFFFFFFFF JJJJJJJJJJJJJ
CCCCCCCCCCCCCCCCCCCCCCCCCC GGGGGGGGGGGGGGGGGGGGGGGGGG KKKKKKKKKKKKKKKK
(000) 0000-0000 CORPORATION (111) 1111-1111 SOLE PROPRIETORSHIP (222) 2222-2222 SOLE PROPRIETORSHIP
lllllllllllllllllllllllllllll XXXXXX ooooooooooooooooo xxxxxx ssssssssssss xxxxx
mmmmmmmmmmmmmmmmmmmmmm ppppppppppppppppppppp tttttttttttttttt
nnnnnnnnnnnnnnnnnnnnnnnnnn qqqqqqqqqqqqqqqqqqqqqqqqqq uuuuuuuuuuuuuu
(333-333-3333 CORPORATION (444) 4444-4444 SOLE PROPRIETORSHIP (555)- 5555-5555 SOLE PROPRIETORSHIP
This format will make more sense if you cut/paste into a text editor so lines will not wrap as it is shown here.
Here's my preliminary code to break this down:
BEGIN {RS==""
OFS="\t"
}
{
row1col1=substr($0,1,33);
#row1col2=substr($0,45,33);
#row1col3=substr($0,89,33);
getline
row2col1=substr($0,1,33);
#row2col2=substr($0,45,33);
#add2col3=substr($0,89,33);
getline
row3col1=substr($0,1,33);
#row3col2=substr($0,45,33);
#row3col3=substr($0,89,33);
getline
row4col1=substr($0,1,33);
#row4col2=substr($0,45,33);
#row4col3=substr($0,89,33);
}
{
#gsub(/ /, "",row1col1); #to remove extra spaces in fields
#gsub(/ /, "",row2col1); #to remove extra spaces in fields
#gsub(/ /, "",row3col1); #to remove extra spaces in fields
#gsub(/ /, "",row4col1); #to remove extra spaces in fields
print row1col1,row2col1,row3col1,row4col1
#printf ("%s%s%s%s",row1col1,row2col1,row3col1,row4col1)
}
I have been testing on just the first column of address, but it gets buggy when the output inserts a newline before the first phone number.
QUESTION:How do you handle a record separator of 2 blank lines?
I am open to different approaches in breaking this down.
Thanks so much for any assistance.