Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help me manipulate text files...

Status
Not open for further replies.

jtb1492

Technical User
Mar 17, 2005
25
US
So I have data (in a file) like this:

A1
F 1
G 2
H 3

A2
F 2
G 5
H 9

B1
F 6
G 3
H 7

B2
4
0
1

Ans I want to switch it up so that it's like this:


A1 A2 A3 A4
F 4 6 3 5
G etc.
H

What's the best/most efficient way to do this? I figure I could loop over the input file once for row F once for row G, etc. I also figure I could read the file just once and store all the data in a hash or arrays. Then print my variables back to the output file.

I'm concerned about time/memory because this script will work on LARGE files. It is possible that the input file could be HUGE, like 1-GB, so in that case I think I have no choise but to loop over the file several times?

Thanks guys!
 
jtb1492, what this suggests to me is a hash of hashes, with F, G, H, as the top-level keys, and A1, A2, etc. as the "subkeys". Here's one way to approach this:
Code:
#!perl
use strict;
use warnings;

my %h;
my @subkeys;
{
    # Set record separator to 1 or more blank lines, inside this block only.
    local $/ = "";

    while (<DATA>) {
        my ($subkey, @rest) = split "\n";
        push @subkeys, $subkey;
        for (@rest) {
            my ($key, $val) = split;
            $h{$key}->{$subkey} = $val;
        }
    }
}

my $first = 1;
@subkeys = sort @subkeys;
for my $k (sort keys %h) {
    if ($first) {
         # Print column headers
         print "\t", join("\t", @subkeys), "\n";
         $first = 0;
    }
    # Print key and data
    print "$k\t";
    print join("\t", @{$h{$k}}{@subkeys}), "\n";
}        

__DATA__
A1 
F 1 
G 2 
H 3 

A2 
F 2 
G 5 
H 9 

B1 
F 6 
G 3 
H 7
Output:
Code:
        A1      A2      B1 
F       1       2       6
G       2       5       3
H       3       9       7
Here's a picture of what the %h hash looks like in Data::Dumper. I recommend you become familiar with this module, as it's great for printing out your data structures for debugging purposes:
Code:
%h = (
       'F' => {
                'A1 ' => '1'
                'A2 ' => '2',
                'B1 ' => '6',
              },
       'G' => {
                'A1 ' => '2'
                'A2 ' => '5',
                'B1 ' => '3',
              }
       'H' => {
                'A1 ' => '3'
                'A2 ' => '9',
                'B1 ' => '7',
              },
     );
HTH



 
P.S.
It is possible that the input file could be HUGE, like 1-GB, so in that case I think I have no choise but to loop over the file several times?
That's an excellent argument for not reading the file several times.

 
Wow, now that's a thurough explanation. Thanks for your time. I'm worried though that this hash may get too large for my computer memory. Should I be worried?
 
Dunno. Depends how much memory you've got. But I wouldn't worry. It's only a computer program. :)

A solution using only arrays would take less memory (and in a language without hashes, you've have no choice), but why don't you see how you get on with the hash idea first. It's a bit more "natural", to my mind anyway, i.e., it fits the problem better than an array-only solution.

 
Here's a version that uses a hash of arrays instead of a hash of hashes. This should be a lot less memory-intensive as it doesn't store a key for each value, only for each "row." (Code's a little simpler, too.)
Code:
#!perl
use strict;
use warnings;

my %h;
my %hdrs;
{
    # Set record separator to 1 or more blank lines, inside this block only.
    local $/ = "";

    my $i = 0;
    while (<DATA>) {
        my ($header, @rest) = split "\n";
        $hdrs{$header} = $i++;
        for (@rest) {
            my ($key, $val) = split;
            push @{$h{$key}}, $val;
        }
    }
}

print "\t", join("\t", sort keys %hdrs), "\n";
for my $k (sort keys %h) {
    print "$k\t";
    print join("\t", @{$h{$k}}), "\n";
}

__DATA__
A1 
F 1 
G 2 
H 3 

A2 
F 2 
G 5 
H 9 

B1 
F 6 
G 3 
H 7
Output is the same as the earlier hash-of-hashes version. Here's what the %h hash looks like now, using Data::Dumper.
Code:
%h = (
       'F' => [
                '1',
                '2',
                '6'
              ],
       'G' => [
                '2',
                '5',
                '3'
              ],
       'H' => [
                '3',
                '9',
                '7'
              ]
     );
HTH

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top