Hashes of Arrays - Relationship 1

Extension · Aug 31, 2005

Hi,

I trying to build hashes of arrays in order to populate a recursive relationship structure based on two flat files. (See expected results at the bottom)
The first flat file contains the "Parents" data and the second flat file contains the "Childs" data. (See below for data)

I'm currently using multiple arrays to populate this structure, so as you may already know this takes alot of time to generate.

So will an hash of arrays be a good solution for this. Any guidance or help would be appreciated.

DATA

Code:

__GROUPDATA__
ID|ABBR|ALTERNATE|ADDRESS|CITY|PARENT_ID|ACTIVE
12|ITT|IIT Alternate|390 Long View|Lisbon|36|1
14|ABG|ABG Alternate|22009 Routa Technica|Citoplo|12|1
16|FFQ|FFQ Alternate|230 River Road|River City|12|1
18|LPP|LPP Alternate|90293 Tera Nosta Mud Road|Mud City|36|1
20|GKQ|GKQ Alternate|232 Portee Road|Potish|16|1
22|TIO|TIO Alternate|454 Looish|Sustone|18|1
24|TRP|TRP Alternate|333 Nesbitt Road|Grey City|18|1
26|REE|REE Alternate|923 Hunty Avenue|Hungstead|18|1
28|TLP|TLP Alternate|730 Lumberg|Houstonish|22|1
30|TAA|TAA Alternate|40222 Laponte|Loomis|22|1
32|TRR|TRR Alternate|2290 Rescto|Ohia|26|1
36|QWE|QWE Alternate|2090 Bolton Road|Redmond| |1


___SUBGROUPDATA__
ID|NAME|JOB|PARENT_ID|ACTIVE
120|Terry Turbo|Stapler Expect|14|1
122|Bobby George|Stapler Expect|20|1
124|Paul Danan|Stapler Expect|20|1
126|Barry Castagnola|Stapler Expect|22|1
128|Jordan Long|Stapler Expect|22|1
130|Norris Windross|Stapler Expect|28|1
132|Adam Saint|Stapler Expect|28|1
134|Adam Peacoc|Stapler Expect|26|1
136|Myfanwy Waring|Stapler Expect|26|1
138|Alexander Newland|Stapler Expect|32|1

Expected output
In bold is GROUP DATA (Parents)
In non-bold is SUB GROUP DATA (Childs)

Code:

[b]- [36] QWE / QWE Alternate (2090 Bolton Road, Redmond)[/b]

	[b]- [12] ITT / IIT Alternate (390 Long View, Lisbon)[/b]
		[b]- [14] ABG / ABG Alternate (22009 Routa Technica, Citoplo)[/b]
		  + [120] Terry Turbo (Stapler Expect)
		[b]- [16] FFQ / FFQ Alternate (230 River Road, River City)[/b]
			[b]- [20] GKQ /GKQ Alternate (232 Portee Road, Potish)
			  + [122] Bobby George (Stapler Expect)
			  + [124] Paul Danan (Stapler Expect)
		
	[b]- [18] LPP / LPP Alternate (90293 Tera Nosta Mud Road, Mud City)[/b]
		[b]- [22] TIO /TIO Alternate (454 Looish, Sustone)[/b]
		  + [126] Barry Castagnola (Stapler Expect)
		  + [128] Jordan Long (Stapler Expect)
			[b]- [28] TLP / TLP Alternate (730 Lumberg, Houstonish)[/b]
			  + [130] Norris Windross (Stapler Expect)
			  + [132] Adam Saint (Stapler Expect)
			[b]- [30] TAA / TAA Alternate (40222 Laponte, Loomis)[/b]
		[b]- [24] TRP / TRP Alternate (333 Nesbitt Road, Grey City)[/b]
		[b]- [26] REE / REE Alternate (923 Hunty Avenue, Hungstead)[/b]
		  + [134] Adam Peacoc (Stapler Expect)
		  + [136] Myfanwy Waring (Stapler Expect)
			[b]- [32] TRR / TRR Alternate (2290 Rescto, Ohia)[/b]
			  + [138] Alexander Newland (Stapler Expect)

Thank you in advance

KevinADC · Aug 31, 2005

should be OK. Are you asking how to do it or just if it's a good idea? Are these realy big files (millions of lines) or just a few thousand lines per each file?

Extension · Aug 31, 2005

Well I was able to do it using multiple arrays, but it's really slow and not efficient as I need to open the files several times. So I thought the hashes of arrays would be more suitable for this. But now I realize that the hashes of arrays is a whole new world to me. So basically I don't know how to do it ! I guess even for an advanced programmer this is a big challenge ?

The files have about 100 to 200 lines each.

Thank you in advance

KevinADC · Aug 31, 2005

hmm...it's proving to be a bit harder than I thought it would. I will have to try later when I have more time if nobody else posts a good solution.

Extension · Aug 31, 2005

Kevin:

It's definitely a bigger challenge than I thought.
Your help is always appreciated.

KevinADC · Sep 1, 2005

this is what I came up with. It's very close to what your desired output is, but lists "childs" and "parents" in numeric order where the relationship is equal and the indentationing isn't quite right. It's not a hash of arrays, but a hash of hashes.

Code:

#!perl -w
use strict;
#use Data::Dump qw(dump);

my %groups = ();

open(PARENT,"GROUPDATA.txt") or die "$!";
pull_data(1,$_) while (<PARENT>) ;
close(PARENT);

open(CHILD,"SUBGROUPDATA.txt") or die "$!";
pull_data(2,$_) while (<CHILD>);
close(CHILD);

#print dump(%groups);
my @ids = sort {$b <=> $a} keys %groups;
my $id = shift @ids;
my $pad = 4;
print "$groups{$id}{ID}\n\n";
for (sort {$a <=> $b} keys %{ $groups{$id}{SUBGROUPS} }) {
   print ' ' x $pad ,  "$groups{$id}{SUBGROUPS}{$_}\n";
   exists $groups{$_}{SUBGROUPS} ? output($_,$pad+=4) : next;
   print "\n";
   $pad-=4;
}

sub output {
   my $id = shift;
   my $pad = shift;
   for (sort {$a <=> $b} keys %{ $groups{$id}{SUBGROUPS} }) {
      print ' ' x $pad ,  "$groups{$id}{SUBGROUPS}{$_}\n";
      exists $groups{$_}{SUBGROUPS} ? output($_,$pad+=4) : next;
   }
}

sub pull_data {
   my $i = shift;
   chomp(local $_ = shift);
   local @_ = split(/\|/);
   if ($i == 1) {
      $groups{$_[5]}{SUBGROUPS}{$_[0]} = "- [$_[0]] $_[1] / $_[2] ($_[3] , $_[4])";
      $groups{$_[0]}{ID} = "- [$_[0]] $_[1] / $_[2] ($_[3] , $_[4])";
   }
   else {
      $groups{$_[3]}{SUBGROUPS}{$_[0]} = "+ [$_[0]] $_[1] ($_[2])";
   }
}

output:

Code:

- [36] QWE / QWE Alternate (2090 Bolton Road , Redmond)

    - [12] ITT / IIT Alternate (390 Long View , Lisbon)
        - [14] ABG / ABG Alternate (22009 Routa Technica , Citoplo)
            + [120] Terry Turbo (Stapler Expect)
            - [16] FFQ / FFQ Alternate (230 River Road , River City)
                - [20] GKQ / GKQ Alternate (232 Portee Road , Potish)
                    + [122] Bobby George (Stapler Expect)
                    + [124] Paul Danan (Stapler Expect)

    - [18] LPP / LPP Alternate (90293 Tera Nosta Mud Road , Mud City)
        - [22] TIO / TIO Alternate (454 Looish , Sustone)
            - [28] TLP / TLP Alternate (730 Lumberg , Houstonish)
                + [130] Norris Windross (Stapler Expect)
                + [132] Adam Saint (Stapler Expect)
                - [30] TAA / TAA Alternate (40222 Laponte , Loomis)
                + [126] Barry Castagnola (Stapler Expect)
                + [128] Jordan Long (Stapler Expect)
            - [24] TRP / TRP Alternate (333 Nesbitt Road , Grey City)
            - [26] REE / REE Alternate (923 Hunty Avenue , Hungstead)
                - [32] TRR / TRR Alternate (2290 Rescto , Ohia)
                    + [138] Alexander Newland (Stapler Expect)
                    + [134] Adam Peacoc (Stapler Expect)
                    + [136] Myfanwy Waring (Stapler Expect)

Extension · Sep 1, 2005

Thank you very much for your help Kevin.

Even if you depend on the numeric sorting (by ID) to generate the list, is there any way to sort it by Name ($_[1]) ?

Again, thank you very much. I check your code more into details.

*

rharsh · Sep 1, 2005

Here's another shot at it using a linked list. This also uses a hash of hashes approach. Although it is not nearly as clean as the code from KevinADC.

Code:

my %hash;
{
    my @data;
    open IN, "< group.txt" or die;
    $_ = <IN>;  # Skip Header Line
    map {push @data,$_} <IN>;
    open IN, "< subgroup.txt" or die;
    $_ = <IN>;  # Skip Header Line
    map {push @data, $_} <IN>;
    close IN;
    chomp @data;
    
    foreach (@data) {
        my @temp = split /\|/, $_;
        my $id = $temp[0];
        my $pid = $temp[-2];
        if (exists($hash{$id})) { die "Duplicate ID!\n"; }
        $hash{$id} = {'par_id' => $pid, 'has_parent' => 0, 'child_ref' => undef,
                      'type' => scalar @temp <= 5 ? '1':'2', 'data' => $_};
    }
}
# Create Linked List
{
    my $data_changed = 1;
    while ($data_changed){
        $data_changed = 0;
        foreach my $id (keys %hash) {
            my $pid = $hash{$id}{par_id};
            if (exists $hash{$pid}) {
                $hash{$id}{has_parent} = 1;
                $hash{$pid}{child_ref}{$id} = $hash{$id};
            }
        }
    }
}

# Cleanup records
foreach (keys %hash) {
    if ($hash{$_}{has_parent}) {
        delete $hash{$_};
    }
}

print_records(0, 0, \%hash);

sub print_records {
    my ($depth, $group_depth, $hashref) = @_;
    my $next_group_depth = $group_depth;
    
    foreach (sort {$hashref->{$a}{type} <=> $hashref->{$b}{type} || $a <=> $b} keys %{$hashref}) {
        my @temp = split(/\|/, $hashref->{$_}->{data});
        if (scalar @temp > 5) {
            print ' ' x (4 * $group_depth) if ($group_depth > 0);
            printf "- [%s] %s / %s (%s, %s)\n", @temp[0..4];
            if (($next_group_depth - $group_depth) == 0) { $next_group_depth++; }
        } else {
            print ' ' x ((4 * ($group_depth-1)) + 2) if ($group_depth > 0);
            printf "+ [%s] %s (%s)\n", @temp[0..2];
        }
        
        # Based on double spacing in the example
        print "\n" if ($depth == 0);
        
        if (defined(${$hashref}{$_}->{child_ref})) {
            print_records($depth+1, $next_group_depth, $hashref->{$_}->{child_ref});
        }
        
        # Based on double spacing in the example
        print "\n" if ($depth == 1);
    }
}

Output:

Code:

- [36] QWE / QWE Alternate (2090 Bolton Road, Redmond)

    - [12] ITT / IIT Alternate (390 Long View, Lisbon)
        - [14] ABG / ABG Alternate (22009 Routa Technica, Citoplo)
          + [120] Terry Turbo (Stapler Expect)
        - [16] FFQ / FFQ Alternate (230 River Road, River City)
            - [20] GKQ / GKQ Alternate (232 Portee Road, Potish)
              + [122] Bobby George (Stapler Expect)
              + [124] Paul Danan (Stapler Expect)

    - [18] LPP / LPP Alternate (90293 Tera Nosta Mud Road, Mud City)
        - [22] TIO / TIO Alternate (454 Looish, Sustone)
          + [126] Barry Castagnola (Stapler Expect)
          + [128] Jordan Long (Stapler Expect)
            - [28] TLP / TLP Alternate (730 Lumberg, Houstonish)
              + [130] Norris Windross (Stapler Expect)
              + [132] Adam Saint (Stapler Expect)
            - [30] TAA / TAA Alternate (40222 Laponte, Loomis)
        - [24] TRP / TRP Alternate (333 Nesbitt Road, Grey City)
        - [26] REE / REE Alternate (923 Hunty Avenue, Hungstead)
          + [134] Adam Peacoc (Stapler Expect)
          + [136] Myfanwy Waring (Stapler Expect)
            - [32] TRR / TRR Alternate (2290 Rescto, Ohia)
              + [138] Alexander Newland (Stapler Expect)

Extension · Sep 1, 2005

Thank you rharsh. Your code is working. I have to agree that Kevin's code is clean.

I've just realized that Kevin's code is not generating to proper structure.

Here's an example.

Kevin's output (Part of it):

Code:

 [b] - [18] LPP / LPP Alternate (90293 Tera Nosta Mud Road , Mud City)[/b]
    [b] - [22] TIO / TIO Alternate (454 Looish , Sustone)
            - [28] TLP / TLP Alternate (730 Lumberg , Houstonish)[/b]
                + [130] Norris Windross (Stapler Expect)
                + [132] Adam Saint (Stapler Expect)
                - [30] TAA / TAA Alternate (40222 Laponte , Loomis)[/b]
             [COLOR=red]   + [126] Barry Castagnola (Stapler Expect) [/color] [i] 
             [COLOR=red]   + [128] Jordan Long (Stapler Expect) [/color]
			 				 126 & 128 are supposed to be under ID 22 not ID 30[/i]
          [b]  - [24] TRP / TRP Alternate (333 Nesbitt Road , Grey City)[/b]
          [b]  - [26] REE / REE Alternate (923 Hunty Avenue , Hungstead)[/b]
             [b]  - [32] TRR / TRR Alternate (2290 Rescto , Ohia)[/b]
                  [COLOR=red]  + [138] Alexander Newland (Stapler Expect) [/color]
                  [COLOR=red]  + [134] Adam Peacoc (Stapler Expect) [/color] [i] 
				               134 & 136 are supposed to be under ID 26 not ID 32[/i]
                  [COLOR=red]  + [136] Myfanwy Waring (Stapler Expect) [/color]

What it should be (Same part):

Code:

[b]- [18] LPP / LPP Alternate (90293 Tera Nosta Mud Road, Mud City) [/b]
		[b]- [22] TIO /TIO Alternate (454 Looish, Sustone)[/b]
		  + [126] Barry Castagnola (Stapler Expect)
		  + [128] Jordan Long (Stapler Expect)
			[b]- [28] TLP / TLP Alternate (730 Lumberg, Houstonish)[/b]
			  + [130] Norris Windross (Stapler Expect)
			  + [132] Adam Saint (Stapler Expect)
			[b]- [30] TAA / TAA Alternate (40222 Laponte, Loomis)[/b]
		[b]- [24] TRP / TRP Alternate (333 Nesbitt Road, Grey City)[/b]
		[b]- [26] REE / REE Alternate (923 Hunty Avenue, Hungstead)[/b]
		  + [134] Adam Peacoc (Stapler Expect)
		  + [136] Myfanwy Waring (Stapler Expect)
			[b]- [32] TRR / TRR Alternate (2290 Rescto, Ohia)[/b]
			  + [138] Alexander Newland (Stapler Expect)

KevinADC · Sep 1, 2005

OK, I think I got the problems sorted out. I didn't want to add more code but I did add a little bit more. I think I could go back to just the output() sub routine by passing in more variables, maybe I'll try that later. I think this code is still fairly clean and concise.

Code:

#!perl -w
use strict;
#use Data::Dump qw(dump);

my %groups = ();

open(PARENT,"GROUPDATA.txt") or die "$!";
pull_data(1,$_) while (<PARENT>) ;
close(PARENT);

open(CHILD,"SUBGROUPDATA.txt") or die "$!";
pull_data(2,$_) while (<CHILD>);
close(CHILD);

#print dump(%groups);

my @ids = sort {$b <=> $a} keys %groups;
my $id = shift @ids;
my $pad = 4;

print "$groups{$id}{ID}\n\n";
for my $i (0 .. $#{$groups{$id}{SUBGROUPS}} ) {
   print ' ' x 4, "$groups{$id}{SUBGROUPS}[$i]->{VALUE}\n";
   my $y = $groups{$id}{SUBGROUPS}[$i]->{ID};
   outputkids($y,$pad) if (exists $groups{$y}{KIDS});
   output($y,$pad)     if (exists $groups{$y}{SUBGROUPS});
   print "\n";
}

sub output {
   my $id = shift;
   my $pad = shift;
   $pad+=4;
   for my $i (0 .. $#{$groups{$id}{SUBGROUPS}}) {
      print ' ' x $pad, "$groups{$id}{SUBGROUPS}[$i]->{VALUE}\n";
      my $y = $groups{$id}{SUBGROUPS}[$i]->{ID};
      outputkids($y,$pad) if (exists $groups{$y}{KIDS});
      output($y,$pad)     if (exists $groups{$y}{SUBGROUPS});
   }
}

sub outputkids {
   my $id = shift;
   my $pad = shift;
   $pad+=2;
   for my $i (0 .. $#{$groups{$id}{KIDS}}) {
      print ' ' x $pad, "$groups{$id}{KIDS}[$i]->{VALUE}\n";
      my $y = $groups{$id}{KIDS}[$i]->{ID};
      exists $groups{$y}{KIDS} ? outputkids($y,$pad) : next;
   }
}

sub pull_data {
   my $i = shift;
   chomp(local $_ = shift);
   local @_ = split(/\|/);
   if ($i == 1) {
      push @{$groups{$_[5]}{SUBGROUPS}}, {ID => $_[0],VALUE => "- [$_[0]] $_[1] / $_[2] ($_[3] , $_[4])"};
      $groups{$_[0]}{ID} = "- [$_[0]] $_[1] / $_[2] ($_[3] , $_[4])";
   }
   else {
      push @{$groups{$_[3]}{KIDS}}, {ID => $_[0],VALUE => "+ [$_[0]] $_[1] ($_[2])"};
   }
}

output from script:

Code:

- [36] QWE / QWE Alternate (2090 Bolton Road , Redmond)

    - [12] ITT / IIT Alternate (390 Long View , Lisbon)
        - [14] ABG / ABG Alternate (22009 Routa Technica , Citoplo)
          + [120] Terry Turbo (Stapler Expect)
        - [16] FFQ / FFQ Alternate (230 River Road , River City)
            - [20] GKQ / GKQ Alternate (232 Portee Road , Potish)
              + [122] Bobby George (Stapler Expect)
              + [124] Paul Danan (Stapler Expect)

    - [18] LPP / LPP Alternate (90293 Tera Nosta Mud Road , Mud City)
        - [22] TIO / TIO Alternate (454 Looish , Sustone)
          + [126] Barry Castagnola (Stapler Expect)
          + [128] Jordan Long (Stapler Expect)
            - [28] TLP / TLP Alternate (730 Lumberg , Houstonish)
              + [130] Norris Windross (Stapler Expect)
              + [132] Adam Saint (Stapler Expect)
            - [30] TAA / TAA Alternate (40222 Laponte , Loomis)
        - [24] TRP / TRP Alternate (333 Nesbitt Road , Grey City)
        - [26] REE / REE Alternate (923 Hunty Avenue , Hungstead)
          + [134] Adam Peacoc (Stapler Expect)
          + [136] Myfanwy Waring (Stapler Expect)
            - [32] TRR / TRR Alternate (2290 Rescto , Ohia)
              + [138] Alexander Newland (Stapler Expect)

note there is no longer any sorting of hashes (except the initial one to get started) since this now uses indexed arrays (SUBGROUPS and KIDS) which are already pre-sorted because the files are sorted. If the files are not sorted then sorting will have to be done if numeric order is important in the final output. Basically it's a hash of hashes / hashes of arrays.

KevinADC · Sep 1, 2005

ahh, I was looking at my error log and noticed the above code was returning an error in the log in the sort function because of a blank field in one of the files so I changed it to get rid of the error:

my @ids = sort {$b cmp $a || $b <=> $a} keys %groups;

Extension · Sep 2, 2005

Kevin.

Your latest code is working properly. I will need to figure out why it's not working with my current data. (Group file has about 100 lines and the Sub Group file has about 200 lines) The fields are in the same order as in the example. (data in my first post)

Thank you very much for your help. Your code is as clean as always. Easy to understand.

*

KevinADC · Sep 2, 2005

Code:

open(PARENT,"GROUPDATA.txt") or die "$!";
my $skip_it = <PARENT>; <-- skips the first line 
pull_data(1,$_) while (<PARENT>) ;
close(PARENT);

open(CHILD,"SUBGROUPDATA.txt") or die "$!";
$skip_it = <CHILD>; <-- skips the first line
pull_data(2,$_) while (<CHILD>);
close(CHILD);

post the file or files you are having trouble with and someone can hopefully help.

Extension · Sep 2, 2005

Kevin. Thank you for the heads up. If you crab the data below, you will see that it does not generate anything.

Code:

GROUPDATA.txt

ID|ABBR|ALTERNATE|ADDRESS|CITY|PARENT_ID|ACTIVE
119|TEST 119|TEST 119|PPR119A|PPR119B||1 --> [COLOR=red]MASTER PARENT[/color]
16|TEST 16|TEST 16|PPR16A|PPR16B|122|1
11|TEST 11|TEST 11|PPR11A|PPR11B|122|1
118|TEST 118|TEST 118|PPR118A|PPR118B|11|1
13|TEST 13|TEST 13|PPR13A|PPR13B|122|1
122|TEST 122|TEST 122|PPR122A|PPR122B|119|1
17|TEST 17|TEST 17|PPR17A|PPR17B|122|1
15|TEST 15|TEST 15|PPR15A|PPR15B|122|1
123|TEST 123|TEST 123|PPR123A|PPR123B|14|1
113|TEST 113|TEST 113|PPR113A|PPR113B|14|1
114|TEST 114|TEST 114|PPR114A|PPR114B|14|1
14|TEST 14|TEST 14|PPR14A|PPR14B|121|1
115|TEST 115|TEST 115|PPR115A|PPR115B|14|1
121|TEST 121|TEST 121|PPR121A|PPR121B|119|1
126|TEST 126|TEST 126|PPR126A|PPR126B|119|1
139|TEST 139|TEST 139|PPR139A|PPR139B|126|1
147|TEST 147|TEST 147|PPR147A|PPR147B|139|1

Code:

SUBGROUDATA.txt

ID|NAME|JOB|PARENT_ID|ACTIVE
220|TT|TT-A|113|1
222|BG|BG-A|114|1
224|PD|PD-A|147|1
226|BC|BC-A|17|1
228|JL|JL-A|17|1
230|NW|NW-A|16|1
232|AS|AS-A|16|1
234|AP|AP-A|15|1
236|MW|MW-A|118|1
238|ANlAN-A|118|1

Code:

[b]- [119] TEST 119 / TEST 119 (PPR119A, PPR119B)[/b]

[b]	- [121] TEST 121 / TEST 121 (PPR121A, PPR121B) [/b]
[b]		- [14] TEST 14 / TEST 14 (PPR14A, PPR14B|121
[b]			- [113] TEST 113 / TEST 113 (PPR113A, PPR113B)[/b]
				+ [220] TT (TT-A)
[b]			- [114] TEST 114 / TEST 114 (PPR114A, PPR114B)[/b]
				+ [222] BG (BG-A)
[b]			- [115] TEST 115 / TEST 115 (PPR115A, PPR115B)[/b]
[b]			- [123] TEST 123 / TEST 123 (PPR123A, PPR123B)[/b]
		
[b]	- [122] TEST 122 / TEST 122 (PPR122A, PPR122B)[/b]
[b]		- [11] TEST 11 / TEST 11 (PPR11A, PPR11B)[/b]
[b]			- [118] TEST 118 / TEST 118 (PPR118A, PPR118B)[/b]
				+ [236] MW (MW-A)
				+ [238] AN (AN-A)
[b]		- [13] TEST 13 / TEST 13 (PPR13A, PPR13B)[/b]
[b]		- [15] TEST 15 / TEST 15 (PPR15A, PPR15B)[/b]
			+ [234] AP (AP-A)
[b]		- [16] TEST 16 / TEST 16 (PPR16A, PPR16B)[/b]
			+ [230] NW (NW-A)
			+ [232] AS (AS-A)
[b]		- [17] TEST 17 / TEST 17 (PPR17A, PPR17B)[/b]
			+ [226] BC (BC-A)
			+ [228] JL (JL-A)


[b]	- [126] TEST 126 / TEST 126 (PPR126A, PPR126B)[/b]
[b]		- [139] TEST 139 / TEST 139 (PPR139A, PPR139B)[/b]
[b]			- [147] TEST 147 / TEST 147 (PPR147A, PPR147B)[/b]
				+ [224] PD (PD-A)

Thank you again Kevin

KevinADC · Sep 2, 2005

Yes, that is different than the previous file, which I made assumptions about. I assumed the record with the highest ID number was the first parent as it is in the first file you posted:

36|QWE|QWE Alternate|2090 Bolton Road|Redmond| |1

but I now assume it's the parent with an empty 5th column. I'll see what I can do.

Extension · Sep 2, 2005

BTW. The output in my last post is what supposed to be generated. The code is not outputting anything. Just thought I would clarify.

Thanks

KevinADC · Sep 2, 2005

I understood. Also note a typo in subgroupdata.txt:

238|ANlAN-A|118|1

there is a lower-case "L" where there should be a pipe (|) between AN and AN. It tripped me up for a few minutes. Here is the modified code. Because the files are no longer in numeric order I am having to sort them, which will slow down the script a little bit. Notice a new variable:

my $magic_key;

it will hold the value of the first parent and get the ball rolling. I assume there will only ever be one "first" parent per file. I also cleaned up the code a liitle more, should be even easier to read:

Code:

#!perl -w
use strict;
#use Data::Dump qw(dump);

my %groups = ();
[b]my $magic_key;[/b]

open(PARENT,"GROUPDATA.txt") or die "$!";
my $skip_it = <PARENT>;
pull_data(1,$_) while (<PARENT>);
close(PARENT);

open(CHILD,"SUBGROUPDATA.txt") or die "$!";
$skip_it = <CHILD>;
pull_data(2,$_) while (<CHILD>);
close(CHILD);

#print dump(%groups);
{
   my $pad = 4;

   print "$groups{[b]$magic_key[/b]}{ID}\n\n";
   map {
      print ' ' x 4, "$_->{VALUE}\n";
      outputkids($_->{ID},$pad) if (exists $groups{$_->{ID}}{KIDS});
      output($_->{ID},$pad)     if (exists $groups{$_->{ID}}{SUBGROUPS});
      print "\n";
   } sort {$a->{ID} <=> $b->{ID}} @{$groups{[b]$magic_key[/b]}{SUBGROUPS}};
}


sub output {
   my $id = shift;
   my $pad = shift;
   $pad+=4;
   map {
      print ' ' x $pad, "$_->{VALUE}\n";
      outputkids($_->{ID},$pad) if (exists $groups{$_->{ID}}{KIDS});
      output($_->{ID},$pad)     if (exists $groups{$_->{ID}}{SUBGROUPS});
   } sort {$a->{ID} <=> $b->{ID}} @{$groups{$id}{SUBGROUPS}};
}

sub outputkids {
   my $id = shift;
   my $pad = shift;
   $pad+=2;
   map {
      print ' ' x $pad, "$_->{VALUE}\n";
      outputkids($_->{ID},$pad) if (exists $groups{$_->{ID}}{KIDS});
   }  sort {$a->{ID} <=> $b->{ID}} @{$groups{$id}{KIDS}};
}

sub pull_data {
   my $i = shift;
   chomp(local $_ = shift);
   local @_ = split(/\|/);
   if ($i == 1) {
      [b]$magic_key = $_[0] if ($_[5] eq '' or $_[5] eq ' ');[/b]
      push @{$groups{$_[5]}{SUBGROUPS}}, {ID => $_[0],VALUE => "- [$_[0]] $_[1] / $_[2] ($_[3] , $_[4])"};
      $groups{$_[0]}{ID} = "- [$_[0]] $_[1] / $_[2] ($_[3] , $_[4])";
   }
   else {
      push @{$groups{$_[3]}{KIDS}}, {ID => $_[0],VALUE => "+ [$_[0]] $_[1] ($_[2])"};
   }
}

output:

Code:

- [119] TEST 119 / TEST 119 (PPR119A , PPR119B)

    - [121] TEST 121 / TEST 121 (PPR121A , PPR121B)
        - [14] TEST 14 / TEST 14 (PPR14A , PPR14B)
            - [113] TEST 113 / TEST 113 (PPR113A , PPR113B)
              + [220] TT (TT-A)
            - [114] TEST 114 / TEST 114 (PPR114A , PPR114B)
              + [222] BG (BG-A)
            - [115] TEST 115 / TEST 115 (PPR115A , PPR115B)
            - [123] TEST 123 / TEST 123 (PPR123A , PPR123B)

    - [122] TEST 122 / TEST 122 (PPR122A , PPR122B)
        - [11] TEST 11 / TEST 11 (PPR11A , PPR11B)
            - [118] TEST 118 / TEST 118 (PPR118A , PPR118B)
              + [236] MW (MW-A)
              + [238] AN (AN-A)
        - [13] TEST 13 / TEST 13 (PPR13A , PPR13B)
        - [15] TEST 15 / TEST 15 (PPR15A , PPR15B)
          + [234] AP (AP-A)
        - [16] TEST 16 / TEST 16 (PPR16A , PPR16B)
          + [230] NW (NW-A)
          + [232] AS (AS-A)
        - [17] TEST 17 / TEST 17 (PPR17A , PPR17B)
          + [226] BC (BC-A)
          + [228] JL (JL-A)

    - [126] TEST 126 / TEST 126 (PPR126A , PPR126B)
        - [139] TEST 139 / TEST 139 (PPR139A , PPR139B)
            - [147] TEST 147 / TEST 147 (PPR147A , PPR147B)
              + [224] PD (PD-A)

Are you getting used to references and complex data storage now? Does this seems better than the methods you were using before?

Extension · Sep 2, 2005

Kevin.

Sorry for the data issue. You're right about the ID number; it's not significant but rather a random key.

Your code works perfectly and is still incredibly clean.
I will need to take a look at it more into details tonight as I want to fully understand it. I was using multiple arrays to do something similar. At first it's was fairly fast to generate the list. But as I was adding more data, I realized how poorly it was written.

So again I must thank you for your precious help. I've realized that manipulating complex hashes (and generating them) is more complicated than I thought.

So now I will try to sort it by the second field. I will also cleanup my data as I found out it was part of the problem.

Thank you.

KevinADC · Sep 2, 2005

You're welcome. You might find it interesting to uncomment these lines if you haven't already:

#use Data:

ump qw(dump);
#print dump(%groups);

so you can see the actual data set, which is invaluable at times. It will give great insight into data storage and how to get what you want out of it. Of course a real database would be even better.

jonadab · Sep 4, 2005

> if you depend on the numeric sorting (by ID) to generate
> the list, is there any way to sort it by Name ($_[1])

Yes. Here is the part of Kevin's code that does the sorting for output purposes:
sort {$a->{ID} <=> $b->{ID}}

That occurs two places, in output and in outputkids. Both places you can change the test there so that instead of ID it looks at another key or some other criterion, and the result will be differently-sorted output.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Hashes of Arrays - Relationship 1

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor