Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Dear experts, I have a probl

Status
Not open for further replies.

leekb

Technical User
Oct 9, 2002
37
SG
Dear experts,

I have a problem in UNIX text processing that I hope to get some help here.

input file
---
3 <--+
1.2 |
apples orange | This section belongs to (3)
1.7 |
pear banana --+
4 <--+
1.3 |
earth mars |
1.7 | This is another section: (4)
jupiter saturn |
1.13 |
pluto moon --+
---


The requirement is to format the input into the following comma delimited file.
---
3,1.2,apple,orange
3,1.7,pear,banana
4,1.3,earth,mars
4,1.7,jupiter,saturn
4.1.13,pluto,moon
---

I would appreciate any recommendation of the right tool to do this kind of job. Is awk the right choice? Code snippet would be even more appreciated. Thanks in advance.

leekb@bigfoot.com
 
AWK is the answer, and I think you'll find thread271-425234
will show you the way. Dickie Bird (:)-)))
 
Here's my attempt

/^[0-9]*$/{
a = $0
next
}
/^[0-9]/ {
if (NR>1) print&quot;&quot;
printf a &quot;,&quot; $0
next
}
{
for (i=1;i<=NF;i++) printf &quot;,&quot; $i
}
END { print &quot;&quot; }
~
~
~ CaKiwi
 
Not as nice, but here's one way to do this in Perl:
Code:
#!/usr/bin/perl -w

while(<>) {
  chomp;
  if( /^\d+$/ ) {
    $field1 = $_;
  } elsif( /^\d+\.\d+$/ ) {
    $field2 = $_;
  } else {
    ($value{$field1}{$field2} = $_) =~ s/\s+/,/g;
  }
}

foreach $f1 (sort { $a <=> $b } keys %value) {
  foreach $f2 (sort { $a <=> $b } keys %{$value{$f1}}) {
    print &quot;$f1,$f2,$value{$f1}{$f2}\n&quot;;
  }
}
Cheers, Neil :)
 
Thanks CaKiwi,

Your solution worked! But there is a problem and it is my fault. Sorry about that. The real input actually looks like the following instead of names of fruits and planets.

input file
---
3
1.2
8478849 87267683
1.7
837983 239849329
4
1.3
1234324 3242433
1.7
2332423 234332333
1.13
34534544 5349874
---

Would you mind adapting the code you demonstrated above to suit this new input? Thanks very much.

And my appreciation to toolkit as well. I am looking at awk as the first preference, perl as as the next.
 
How do you differentiate between the different record types? For example does the record that starts a new section always contain only a single integer? CaKiwi
 
Hi CaKiwi,

Let me elaborate.

3 -> Section start (integer always btwn 3-20)
1.2 --+
8478849 87267683 --+ this is one record
1.7 --+
837983 239849329 --+ this is next record of the same section


The block of record contains

1.2 -> a number with max of 2 decimal places.
8478849 87267683 -> always followed by a pair of integer.


So in essence,

the following input file of
---------------------------
3
1.2
8478849 87267683
1.7
837983 239849329
4
1.3
1234324 3242433
1.7
2332423 234332333
1.13
34534544 5349874

would convert to

3,1.2,8478849,87267683
3,1.7,837983,239849329
4,1.3,1234324,3242433
4,1.7,2332423,234332333
4,1.13,34534544,5349874

thanks in advance.
 
This may do what you want.

/^[0-9]*$/{
a = $0
next
}
/^[0-9]\.[0-9]/ {
if (flg) print&quot;&quot;
printf a &quot;,&quot; $0
next
}
{
for (i=1;i<=NF;i++) printf &quot;,&quot; $i
flg = 1
}
END { print &quot;&quot; }
CaKiwi
 
Thanks very much CaKiwi. That was just what I wanted.

cheers.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top