Data manipulation 8

Extension · Mar 3, 2005

Hi,

I trying to find a simple way to re-structure data and then create a new flat file.

You will see the logic with the data below.

From this format:

Code:

#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
#A53700
TQQ[1A6,1C0-1C4,1P0]

To this format:

Code:

A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR1R1,TRR2R1,TRR2R2,TRR2R3
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

Let me know if it's not clear enough.

Thank you in advance.

stevexff · Mar 3, 2005

Post what you've got so far. This is an advice forum, not a coding service...

Extension · Mar 3, 2005

Sorry stevexff,

Here's my current code. I'm getting an endless loop which I'm trying to resolve since yesterday.

Data: (Source File)

Code:

#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Code:

sub GenerateFile {
	
	$OutputFile = "\location\output";
	$SourceFile = "\location\source";
	
	open(OUT, ">$OutputFile") || die qq("Missing File");

	my $previous = "";
	
	open(FILE,$SourceFile) || die("Missing File");
	
 	while (my $Row = <FILE>) {
	
		# Determine ID number
		if ($Row =~ /^#(\d{6})/) {
		$ID = $1; 
		}
			
		elsif ($Row =~ /^([A-Z]\d[A-Z])\[([A-Z0-9-,]+)\]/) {
		my $Suffixes = $2;
		
			foreach my $Entry (split(/,/,$Suffixes)) {
				if ($Entry =~ /(\d[A-Z]\d)-(\d[A-Z]\d)/) {
				# Determine range.
				my ($Low,$High) = ($1,$2);
					if ($ID eq $previous) {
					print OUT ",$1$2";
					}
					
					else {
					print OUT $previous? "\n": "", $_;
    				}
	
					$previous = $ID;
					
				}
			}
      
		print OUT "\n"; 
		
		}
		
	}
  

}

Thank you in advance.

rharsh · Mar 3, 2005

Here's a rough script, it works with the data you specified. It should at least give you a start.

Code:

while (<DATA>) {
    if (/^#(\w+)$/) {
        my $output = "$1\|";
        $_ = <DATA>;
        m/(\w+)\[([^\]]+)\]/;
        my ($prefix, @suffixes) = ($1, split(",", $2));
        my @temp = ();
        
        foreach my $suffix (@suffixes) {
            if ($suffix =~ /-/) {
                $suffix =~ /^(\w{2})(\d)-\1(\d)/;
                my ($text, $begin, $end) = ($1, $2, $3);
                
                foreach my $i ($begin..$end) {
                    push @temp, "${prefix}${text}${i}";
                }
            } else {
                push @temp, "$prefix$suffix";
            }
        }
        $output .= join(',', @temp);
        print $output, "\n";
    }
}

__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Extension · Mar 3, 2005

Thank you very much for you help rharsh. It's really appreciated. It's certainly a good start.

Again. Thank you

stevexff · Mar 3, 2005

Neat solution, rharsh, particularly the use of the

Code:

foreach my $i ([red]$begin..$end[/red])

to deal with the range of values. Gets a star from me too...

Extension · Mar 3, 2005

One last question rharsh.
If there is more than one line under an ID such as:

Code:

#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
[b]TRQ[2W2-2W4,2Y7][/b]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Will a foreach statement (foreach rows/lines) before the foreach suffix statement will resolve this issue?

Again, thank you very much.

ishnid · Mar 4, 2005

TIMTOWDTI and *very*, *very* ugly. Works though

Code:

#!/usr/bin/perl -w
use strict;

local $/ = undef;
local $_ = <DATA>;
s/\n(?!#|$)/,/g;
s/(\w{3})\[([^\]]+)\]/join (',', map "$1$_", split( ',', $2 ) )/ge;
s/(\w{5})(\d)-\w{2}(\d)/ join ',', map "$1$_", $2 .. $3 /ge;
s/(?<=#\w{6}),/\|/g;
print;


__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Extension · Mar 4, 2005

Thanks ishnid. Yes it's working. I'll have to figure out exactly how it's working first. Thank you for your help.

I'll wait for rharsh to reply, to see if his solution is easy to adapt in order to support more than one line under a specific ID.

Thanks again.

ishnid · Mar 4, 2005

With comments:

Code:

# enable `slurp' mode
local $/ = undef;

# read the entire data into the $_ variable
local $_ = <DATA>;

# replace all newlines with commas, except ones before the 
# `#' (start of a new record) and the one at the end
s/\n(?!#|$)/,/g;

# put the prefix before each one and remove the brackets
# i.e. turn AAA[1A1,1A2,3B2-3B5] into AAA1A1,AAA1A2,AAA3B2-3B5
s/(\w{3})\[([^\]]+)\]/join (',', map "$1$_", split( ',', $2 ) )/ge;

# expand out the ranges (i.e. AAA3B2-3B5)
s/(\w{5})(\d)-\w{2}(\d)/ join ',', map "$1$_", $2 .. $3 /ge;

# change the first comma in each record to a `|'
s/(?<=#\w{6}),/\|/g;

# this one's obvious :)
print;

rharsh · Mar 4, 2005

Thanks for the stars!

Extension, see if this works for you. I was tempted to rewrite this using map a couple of times, like most the other solutions, but this works well enough.

Code:

my $label;

while (<DATA>) {
    if (/^#(\w+)/) {
        $label = $1;
    } else {
        m/(\w+)\[([^\]]+)\]/;
        my ($prefix, @suffixes) = ($1, split(",", $2));
        my @temp = ();
        
        foreach my $suffix (@suffixes) {
            if ($suffix =~ /-/) {
                push @temp, &distribute($prefix, $suffix);
            } else {
                push @temp, "$prefix$suffix";
            }
        }
        print "$label\|" . join(',', @temp), "\n";
    }
}

# ARGS: prefix, suffix range
sub distribute ($$) {
    my ($prefix, $suffix) = @_;
    my @temp;
    
    $suffix =~ /^(\w{2})(\d)-\1(\d)/;
    my ($text, $begin, $end) = ($1, $2, $3);   
    foreach my $i ($begin..$end) {
        push @temp, "${prefix}${text}${i}";
    }
    return @temp;
}

Extension · Mar 5, 2005

rharsh Thank you again for your help. I owe you more than one star for all your help.

Based on this data

Code:

__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

I'm getting the following:

Code:

[b]A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3
A59993|TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7[/b]
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

instead of

Code:

[b]A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3,TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7[/b]
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

Thank you again.

mikevh · Mar 5, 2005

Modifying rharsh's very good code a little (see bolded):

Code:

#!perl
use strict;
use warnings;

my $label;
[b]my %h;
my @labels;[/b]

while (<DATA>) {
    [b]chomp;[/b]
    if (/^#(\w+)/) {
        $label = $1;
        [b]push (@labels, $label) unless grep {$_ eq $label} @labels;[/b]
    } else {
        m/(\w+)\[([^\]]+)\]/;
        my ($prefix, @suffixes) = ($1, split(",", $2));
        
        foreach my $suffix (@suffixes) {
            if ($suffix =~ /-/) {
                [b]push @{$h{$label}}, distribute($prefix, $suffix);[/b]
            } else {
                [b]push @{$h{$label}}, "$prefix$suffix";[/b]
            }
        }
    }
}

[b]for (@labels) {
    print join("|", $_, join(",", @{$h{$_}})), "\n";
}[/b]

# ARGS: prefix, suffix range
sub distribute  {
    my ($prefix, $suffix) = @_;
    my @temp;
    
    $suffix =~ /^(\w{2})(\d)-\1(\d)/;
    my ($text, $begin, $end) = ($1, $2, $3);   
    foreach my $i ($begin..$end) {
        push @temp, "${prefix}${text}${i}";
    }
    return @temp;
}

__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

This produces the desired output you have posted above.
A star for rharsh, who did all the tricky parts.

rharsh · Mar 6, 2005

Thanks again for the stars.

Extension, I guessed at how you wanted to handle multiple lines for each 'label' - I should have asked!

Thanks for straightening it out mikevh. A star for you too.

Extension · Mar 6, 2005

Wow.

Thank you rharsh, mikevh and ishnid. Your help was really appreciated.

rharsh: I really liked the solution you came up with. Easier and less complicated for me.
ishnid: Thank you for the explanation.
mikevh: Thank you for your help.

You all get a big star.
Thanks

Extension · Mar 7, 2005

Now I'm trying to do the reverse process which is to go from

Code:

A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3,TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

TO

Code:

#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Right now I'm stuck with one of the biggest endless loop ever. My PC went out of memory in less than 10 secs of processing.

I'll post my code later on...

Arg...

KevinADC · Mar 9, 2005

this could probably be improved upon a bit but it seems to work OK, and its perl

Code:

my %HoH = ();
while(<DATA>){
   chomp;
   my $temp = $_;
   my ($label,$list) = split(/\|/);
   my @data = split(/,/,$list);
   for (@data) {
      /^(\w{3})(\w{2})(.)/;
      push @{$HoH{$label}{$1}{$2}},"$3"; 
   }
}
foreach my $keys (reverse sort keys %HoH){
   print "#$keys\n";
   foreach my $i (reverse sort keys %{$HoH{$keys}}) {
      print "$i\[";
      my $y = 0;
      foreach  my $n (sort keys %{$HoH{$keys}{$i}}) {
         my $num = keys %{$HoH{$keys}{$i}};
         $y++;
         print "$n$HoH{$keys}{$i}{$n}->[0]";
         print "\-$n$HoH{$keys}{$i}{$n}->[$#{$HoH{$keys}{$i}{$n}}]" if (scalar @{$HoH{$keys}{$i}{$n}}>1);
         print ',' if $y < $num;
      }
      print "]\n";
   }
}
__DATA__
A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3,TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

prints:

Code:

#A59993
TRR[1T1-1T3,1W9,1Y7,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

the part of the script that decides when tp print a comma between ranges of bracketed data is a bit klunky but it works.

KevinADC · Mar 9, 2005

these two lines should be traded position:

foreach my $n (sort keys %{$HoH{$keys}{$i}}) {
my $num = keys %{$HoH{$keys}{$i}};

that way $num only gets calculated once per loop:

my $num = keys %{$HoH{$keys}{$i}};
foreach my $n (sort keys %{$HoH{$keys}{$i}}) {

KevinADC · Mar 9, 2005

darn, just noticed I left a line in there from when I was testing the code:

my $temp = $_;

it can be removed.

Extension · Mar 10, 2005

KevinADC:

Thank you for your help. Your solution does work.

I'm trying to print to a file. I have the following at the beginning:

Code:

open(OUT, ">file.txt") || die qq("file.txt is missing");

But to adapt the foreach statement

Code:

foreach my $keys (reverse sort keys %HoH){
  [b] print OUT "#$keys\n"; [/b]
   foreach my $i (reverse sort keys %{$HoH{$keys}}) {
      print "$i\[";
      my $y = 0;
      foreach  my $n (sort keys %{$HoH{$keys}{$i}}) {
         my $num = keys %{$HoH{$keys}{$i}};
         $y++;
         print "$n$HoH{$keys}{$i}{$n}->[0]";
         print "\-$n$HoH{$keys}{$i}{$n}->[$#{$HoH{$keys}{$i}{$n}}]" if (scalar @{$HoH{$keys}{$i}{$n}}>1);
         print ',' if $y < $num;
      }
      print "]\n";
   }
}

I'm getting an endless loop....

Thanks... Really appreacited

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Data manipulation 8

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Technical User

Programmer

Programmer

Technical User

Technical User

Technical User

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor