Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data manipulation 8

Status
Not open for further replies.

Extension

Programmer
Nov 3, 2004
311
CA
Hi,

I trying to find a simple way to re-structure data and then create a new flat file.

You will see the logic with the data below.

From this format:

Code:
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
#A53700
TQQ[1A6,1C0-1C4,1P0]

To this format:

Code:
A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR1R1,TRR2R1,TRR2R2,TRR2R3
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

Let me know if it's not clear enough.

Thank you in advance.
 
Post what you've got so far. This is an advice forum, not a coding service...
 
Sorry stevexff,

Here's my current code. I'm getting an endless loop which I'm trying to resolve since yesterday.

Data: (Source File)
Code:
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Code:
Code:
sub GenerateFile {
	
	$OutputFile = "\location\output";
	$SourceFile = "\location\source";
	
	open(OUT, ">$OutputFile") || die qq("Missing File");

	my $previous = "";
	
	open(FILE,$SourceFile) || die("Missing File");
	
 	while (my $Row = <FILE>) {
	
		# Determine ID number
		if ($Row =~ /^#(\d{6})/) {
		$ID = $1; 
		}
			
		elsif ($Row =~ /^([A-Z]\d[A-Z])\[([A-Z0-9-,]+)\]/) {
		my $Suffixes = $2;
		
			foreach my $Entry (split(/,/,$Suffixes)) {
				if ($Entry =~ /(\d[A-Z]\d)-(\d[A-Z]\d)/) {
				# Determine range.
				my ($Low,$High) = ($1,$2);
					if ($ID eq $previous) {
					print OUT ",$1$2";
					}
					
					else {
					print OUT $previous? "\n": "", $_;
    				}
	
					$previous = $ID;
					
				}
			}
      
		print OUT "\n"; 
		
		}
		
	}
  

}

Thank you in advance.
 
Here's a rough script, it works with the data you specified. It should at least give you a start.

Code:
while (<DATA>) {
    if (/^#(\w+)$/) {
        my $output = "$1\|";
        $_ = <DATA>;
        m/(\w+)\[([^\]]+)\]/;
        my ($prefix, @suffixes) = ($1, split(",", $2));
        my @temp = ();
        
        foreach my $suffix (@suffixes) {
            if ($suffix =~ /-/) {
                $suffix =~ /^(\w{2})(\d)-\1(\d)/;
                my ($text, $begin, $end) = ($1, $2, $3);
                
                foreach my $i ($begin..$end) {
                    push @temp, "${prefix}${text}${i}";
                }
            } else {
                push @temp, "$prefix$suffix";
            }
        }
        $output .= join(',', @temp);
        print $output, "\n";
    }
}

__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
#A53700
TQQ[1A6,1C0-1C4,1P0]
 
Thank you very much for you help rharsh. It's really appreciated. It's certainly a good start.

Again. Thank you

 
Neat solution, rharsh, particularly the use of the
Code:
foreach my $i ([red]$begin..$end[/red])
to deal with the range of values. Gets a star from me too...
 
One last question rharsh.
If there is more than one line under an ID such as:

Code:
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
[b]TRQ[2W2-2W4,2Y7][/b]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Will a foreach statement (foreach rows/lines) before the foreach suffix statement will resolve this issue?

Again, thank you very much.
 
TIMTOWDTI and *very*, *very* ugly. Works though :)

Code:
#!/usr/bin/perl -w
use strict;

local $/ = undef;
local $_ = <DATA>;
s/\n(?!#|$)/,/g;
s/(\w{3})\[([^\]]+)\]/join (',', map "$1$_", split( ',', $2 ) )/ge;
s/(\w{5})(\d)-\w{2}(\d)/ join ',', map "$1$_", $2 .. $3 /ge;
s/(?<=#\w{6}),/\|/g;
print;


__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]
 
Thanks ishnid. Yes it's working. I'll have to figure out exactly how it's working first. Thank you for your help.

I'll wait for rharsh to reply, to see if his solution is easy to adapt in order to support more than one line under a specific ID.

Thanks again.
 
With comments:
Code:
# enable `slurp' mode
local $/ = undef;

# read the entire data into the $_ variable
local $_ = <DATA>;

# replace all newlines with commas, except ones before the 
# `#' (start of a new record) and the one at the end
s/\n(?!#|$)/,/g;

# put the prefix before each one and remove the brackets
# i.e. turn AAA[1A1,1A2,3B2-3B5] into AAA1A1,AAA1A2,AAA3B2-3B5
s/(\w{3})\[([^\]]+)\]/join (',', map "$1$_", split( ',', $2 ) )/ge;

# expand out the ranges (i.e. AAA3B2-3B5)
s/(\w{5})(\d)-\w{2}(\d)/ join ',', map "$1$_", $2 .. $3 /ge;

# change the first comma in each record to a `|'
s/(?<=#\w{6}),/\|/g;

# this one's obvious :)
print;
 
Thanks for the stars!

Extension, see if this works for you. I was tempted to rewrite this using map a couple of times, like most the other solutions, but this works well enough.

Code:
my $label;

while (<DATA>) {
    if (/^#(\w+)/) {
        $label = $1;
    } else {
        m/(\w+)\[([^\]]+)\]/;
        my ($prefix, @suffixes) = ($1, split(",", $2));
        my @temp = ();
        
        foreach my $suffix (@suffixes) {
            if ($suffix =~ /-/) {
                push @temp, &distribute($prefix, $suffix);
            } else {
                push @temp, "$prefix$suffix";
            }
        }
        print "$label\|" . join(',', @temp), "\n";
    }
}

# ARGS: prefix, suffix range
sub distribute ($$) {
    my ($prefix, $suffix) = @_;
    my @temp;
    
    $suffix =~ /^(\w{2})(\d)-\1(\d)/;
    my ($text, $begin, $end) = ($1, $2, $3);   
    foreach my $i ($begin..$end) {
        push @temp, "${prefix}${text}${i}";
    }
    return @temp;
}
 
rharsh Thank you again for your help. I owe you more than one star for all your help.

Based on this data
Code:
__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

I'm getting the following:

Code:
[b]A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3
A59993|TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7[/b]
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

instead of

Code:
[b]A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3,TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7[/b]
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

Thank you again.

 
Modifying rharsh's very good code a little (see bolded):
Code:
#!perl
use strict;
use warnings;

my $label;
[b]my %h;
my @labels;[/b]

while (<DATA>) {
    [b]chomp;[/b]
    if (/^#(\w+)/) {
        $label = $1;
        [b]push (@labels, $label) unless grep {$_ eq $label} @labels;[/b]
    } else {
        m/(\w+)\[([^\]]+)\]/;
        my ($prefix, @suffixes) = ($1, split(",", $2));
        
        foreach my $suffix (@suffixes) {
            if ($suffix =~ /-/) {
                [b]push @{$h{$label}}, distribute($prefix, $suffix);[/b]
            } else {
                [b]push @{$h{$label}}, "$prefix$suffix";[/b]
            }
        }
    }
}

[b]for (@labels) {
    print join("|", $_, join(",", @{$h{$_}})), "\n";
}[/b]

# ARGS: prefix, suffix range
sub distribute  {
    my ($prefix, $suffix) = @_;
    my @temp;
    
    $suffix =~ /^(\w{2})(\d)-\1(\d)/;
    my ($text, $begin, $end) = ($1, $2, $3);   
    foreach my $i ($begin..$end) {
        push @temp, "${prefix}${text}${i}";
    }
    return @temp;
}

__DATA__
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]
This produces the desired output you have posted above.
A star for rharsh, who did all the tricky parts.

 
Thanks again for the stars.

Extension, I guessed at how you wanted to handle multiple lines for each 'label' - I should have asked!

Thanks for straightening it out mikevh. A star for you too.
 
Wow.

Thank you rharsh, mikevh and ishnid. Your help was really appreciated.

rharsh: I really liked the solution you came up with. Easier and less complicated for me.
ishnid: Thank you for the explanation.
mikevh: Thank you for your help.


You all get a big star.
Thanks
 
Now I'm trying to do the reverse process which is to go from

Code:
A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3,TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

TO

Code:
#A59993
TRR[1T1-1T3,1Y7,1W9,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

Right now I'm stuck with one of the biggest endless loop ever. My PC went out of memory in less than 10 secs of processing.

I'll post my code later on...

Arg...

 
this could probably be improved upon a bit but it seems to work OK, and its perl :)

Code:
my %HoH = ();
while(<DATA>){
   chomp;
   my $temp = $_;
   my ($label,$list) = split(/\|/);
   my @data = split(/,/,$list);
   for (@data) {
      /^(\w{3})(\w{2})(.)/;
      push @{$HoH{$label}{$1}{$2}},"$3"; 
   }
}
foreach my $keys (reverse sort keys %HoH){
   print "#$keys\n";
   foreach my $i (reverse sort keys %{$HoH{$keys}}) {
      print "$i\[";
      my $y = 0;
      foreach  my $n (sort keys %{$HoH{$keys}{$i}}) {
         my $num = keys %{$HoH{$keys}{$i}};
         $y++;
         print "$n$HoH{$keys}{$i}{$n}->[0]";
         print "\-$n$HoH{$keys}{$i}{$n}->[$#{$HoH{$keys}{$i}{$n}}]" if (scalar @{$HoH{$keys}{$i}{$n}}>1);
         print ',' if $y < $num;
      }
      print "]\n";
   }
}
__DATA__
A59993|TRR1T1,TRR1T2,TRR1T3,TRR1Y7,TRR1W9,TRR2R1,TRR2R2,TRR2R3,TRQ2W2,TRQ2W3,TRQ2W4,TRQ2Y7
A53700|TQQ1A6,TQQ1C0,TQQ1C1,TQQ1C2,TQQ1C3,TQQ1C4,TQQ1P0

prints:

Code:
#A59993
TRR[1T1-1T3,1W9,1Y7,2R1-2R3]
TRQ[2W2-2W4,2Y7]
#A53700
TQQ[1A6,1C0-1C4,1P0]

the part of the script that decides when tp print a comma between ranges of bracketed data is a bit klunky but it works.
 
these two lines should be traded position:

foreach my $n (sort keys %{$HoH{$keys}{$i}}) {
my $num = keys %{$HoH{$keys}{$i}};

that way $num only gets calculated once per loop:

my $num = keys %{$HoH{$keys}{$i}};
foreach my $n (sort keys %{$HoH{$keys}{$i}}) {
 
darn, just noticed I left a line in there from when I was testing the code:

my $temp = $_;

it can be removed.
 
KevinADC:

Thank you for your help. Your solution does work.

I'm trying to print to a file. I have the following at the beginning:
Code:
open(OUT, ">file.txt") || die qq("file.txt is missing");

But to adapt the foreach statement
Code:
foreach my $keys (reverse sort keys %HoH){
  [b] print OUT "#$keys\n"; [/b]
   foreach my $i (reverse sort keys %{$HoH{$keys}}) {
      print "$i\[";
      my $y = 0;
      foreach  my $n (sort keys %{$HoH{$keys}{$i}}) {
         my $num = keys %{$HoH{$keys}{$i}};
         $y++;
         print "$n$HoH{$keys}{$i}{$n}->[0]";
         print "\-$n$HoH{$keys}{$i}{$n}->[$#{$HoH{$keys}{$i}{$n}}]" if (scalar @{$HoH{$keys}{$i}{$n}}>1);
         print ',' if $y < $num;
      }
      print "]\n";
   }
}

I'm getting an endless loop....

Thanks... Really appreacited
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top