Help me manipulate text files...

jtb1492 · Mar 24, 2005

So I have data (in a file) like this:

A1
F 1
G 2
H 3

A2
F 2
G 5
H 9

B1
F 6
G 3
H 7

B2
4
0
1

Ans I want to switch it up so that it's like this:

A1 A2 A3 A4
F 4 6 3 5
G etc.
H

What's the best/most efficient way to do this? I figure I could loop over the input file once for row F once for row G, etc. I also figure I could read the file just once and store all the data in a hash or arrays. Then print my variables back to the output file.

I'm concerned about time/memory because this script will work on LARGE files. It is possible that the input file could be HUGE, like 1-GB, so in that case I think I have no choise but to loop over the file several times?

Thanks guys!

mikevh · Mar 24, 2005

jtb1492, what this suggests to me is a hash of hashes, with F, G, H, as the top-level keys, and A1, A2, etc. as the "subkeys". Here's one way to approach this:

Code:

#!perl
use strict;
use warnings;

my %h;
my @subkeys;
{
    # Set record separator to 1 or more blank lines, inside this block only.
    local $/ = "";

    while (<DATA>) {
        my ($subkey, @rest) = split "\n";
        push @subkeys, $subkey;
        for (@rest) {
            my ($key, $val) = split;
            $h{$key}->{$subkey} = $val;
        }
    }
}

my $first = 1;
@subkeys = sort @subkeys;
for my $k (sort keys %h) {
    if ($first) {
         # Print column headers
         print "\t", join("\t", @subkeys), "\n";
         $first = 0;
    }
    # Print key and data
    print "$k\t";
    print join("\t", @{$h{$k}}{@subkeys}), "\n";
}        

__DATA__
A1 
F 1 
G 2 
H 3 

A2 
F 2 
G 5 
H 9 

B1 
F 6 
G 3 
H 7

Output:

Code:

        A1      A2      B1 
F       1       2       6
G       2       5       3
H       3       9       7

Here's a picture of what the %h hash looks like in Data:umper. I recommend you become familiar with this module, as it's great for printing out your data structures for debugging purposes:

Code:

%h = (
       'F' => {
                'A1 ' => '1'
                'A2 ' => '2',
                'B1 ' => '6',
              },
       'G' => {
                'A1 ' => '2'
                'A2 ' => '5',
                'B1 ' => '3',
              }
       'H' => {
                'A1 ' => '3'
                'A2 ' => '9',
                'B1 ' => '7',
              },
     );

HTH

mikevh · Mar 24, 2005

P.S.

It is possible that the input file could be HUGE, like 1-GB, so in that case I think I have no choise but to loop over the file several times?

That's an excellent argument for not reading the file several times.

jtb1492 · Mar 24, 2005

Wow, now that's a thurough explanation. Thanks for your time. I'm worried though that this hash may get too large for my computer memory. Should I be worried?

mikevh · Mar 24, 2005

Dunno. Depends how much memory you've got. But I wouldn't worry. It's only a computer program.

A solution using only arrays would take less memory (and in a language without hashes, you've have no choice), but why don't you see how you get on with the hash idea first. It's a bit more "natural", to my mind anyway, i.e., it fits the problem better than an array-only solution.

mikevh · Mar 25, 2005

Here's a version that uses a hash of arrays instead of a hash of hashes. This should be a lot less memory-intensive as it doesn't store a key for each value, only for each "row." (Code's a little simpler, too.)

Code:

#!perl
use strict;
use warnings;

my %h;
my %hdrs;
{
    # Set record separator to 1 or more blank lines, inside this block only.
    local $/ = "";

    my $i = 0;
    while (<DATA>) {
        my ($header, @rest) = split "\n";
        $hdrs{$header} = $i++;
        for (@rest) {
            my ($key, $val) = split;
            push @{$h{$key}}, $val;
        }
    }
}

print "\t", join("\t", sort keys %hdrs), "\n";
for my $k (sort keys %h) {
    print "$k\t";
    print join("\t", @{$h{$k}}), "\n";
}

__DATA__
A1 
F 1 
G 2 
H 3 

A2 
F 2 
G 5 
H 9 

B1 
F 6 
G 3 
H 7

Output is the same as the earlier hash-of-hashes version. Here's what the %h hash looks like now, using Data:umper.

Code:

HTH

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help me manipulate text files...

jtb1492

Technical User

mikevh

Programmer

mikevh

Programmer

jtb1492

Technical User

mikevh

Programmer

mikevh

Programmer

Similar threads

Part and Inventory Search

Sponsor