Dear experts, I have a probl

leekb · Dec 11, 2002

Dear experts,

I have a problem in UNIX text processing that I hope to get some help here.

input file
---
3 <--+
1.2 |
apples orange | This section belongs to (3)
1.7 |
pear banana --+
4 <--+
1.3 |
earth mars |
1.7 | This is another section: (4)
jupiter saturn |
1.13 |
pluto moon --+
---

The requirement is to format the input into the following comma delimited file.
---
3,1.2,apple,orange
3,1.7,pear,banana
4,1.3,earth,mars
4,1.7,jupiter,saturn
4.1.13,pluto,moon
---

I would appreciate any recommendation of the right tool to do this kind of job. Is awk the right choice? Code snippet would be even more appreciated. Thanks in advance.

leekb@bigfoot.com

dickiebird · Dec 11, 2002

AWK is the answer, and I think you'll find thread271-425234
will show you the way. Dickie Bird (

-)))

CaKiwi · Dec 12, 2002

Here's my attempt

/^[0-9]*$/{
a = $0
next
}
/^[0-9]/ {
if (NR>1) print""
printf a "," $0
next
}
{
for (i=1;i<=NF;i++) printf "," $i
}
END { print "" }
~
~
~ CaKiwi

toolkit · Dec 12, 2002

Not as nice, but here's one way to do this in Perl:

Code:

#!/usr/bin/perl -w

while(<>) {
  chomp;
  if( /^\d+$/ ) {
    $field1 = $_;
  } elsif( /^\d+\.\d+$/ ) {
    $field2 = $_;
  } else {
    ($value{$field1}{$field2} = $_) =~ s/\s+/,/g;
  }
}

foreach $f1 (sort { $a <=> $b } keys %value) {
  foreach $f2 (sort { $a <=> $b } keys %{$value{$f1}}) {
    print &quot;$f1,$f2,$value{$f1}{$f2}\n&quot;;
  }
}

Cheers, Neil

leekb · Dec 12, 2002

Thanks CaKiwi,

Your solution worked! But there is a problem and it is my fault. Sorry about that. The real input actually looks like the following instead of names of fruits and planets.

input file
---
3
1.2
8478849 87267683
1.7
837983 239849329
4
1.3
1234324 3242433
1.7
2332423 234332333
1.13
34534544 5349874
---

Would you mind adapting the code you demonstrated above to suit this new input? Thanks very much.

And my appreciation to toolkit as well. I am looking at awk as the first preference, perl as as the next.

CaKiwi · Dec 12, 2002

How do you differentiate between the different record types? For example does the record that starts a new section always contain only a single integer? CaKiwi

leekb · Dec 12, 2002

Hi CaKiwi,

Let me elaborate.

3 -> Section start (integer always btwn 3-20)
1.2 --+
8478849 87267683 --+ this is one record
1.7 --+
837983 239849329 --+ this is next record of the same section

The block of record contains

1.2 -> a number with max of 2 decimal places.
8478849 87267683 -> always followed by a pair of integer.

So in essence,

the following input file of
---------------------------
3
1.2
8478849 87267683
1.7
837983 239849329
4
1.3
1234324 3242433
1.7
2332423 234332333
1.13
34534544 5349874

would convert to

3,1.2,8478849,87267683
3,1.7,837983,239849329
4,1.3,1234324,3242433
4,1.7,2332423,234332333
4,1.13,34534544,5349874

thanks in advance.

CaKiwi · Dec 13, 2002

This may do what you want.

/^[0-9]*$/{
a = $0
next
}
/^[0-9]\.[0-9]/ {
if (flg) print""
printf a "," $0
next
}
{
for (i=1;i<=NF;i++) printf "," $i
flg = 1
}
END { print "" }
CaKiwi

leekb · Dec 13, 2002

Thanks very much CaKiwi. That was just what I wanted.

cheers.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Dear experts, I have a probl

leekb

Technical User

dickiebird

Programmer

CaKiwi

Programmer

toolkit

Programmer

leekb

Technical User

CaKiwi

Programmer

leekb

Technical User

CaKiwi

Programmer

leekb

Technical User

Similar threads

Part and Inventory Search

Sponsor