Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

for each unique value in a column find the max value..need perl script

Status
Not open for further replies.

jaiperl04

IS-IT--Management
May 31, 2011
9
DE
Hi,

Below is my Input file:

date mtime no size id day order
20100607 154538.354300 200 1 101510 14098703993
20100607 154539.420000 200 1 101511 14098703993
20100607 154538.398200 487 1 100888 14098703994
20100607 154610.720000 487 1 91588 14098703994
20100607 154538.401200 200 1 101502 14098703995
20100607 154539.420000 200 1 101500 14098703995
I need a perl code to get the foll o/p. For each unique order field, calculate the oldest mtime; and for each unique order field replace the day field by the oldest mtime. For example, the 1st 2 rows have the same order field=14098703993, for which the oldest mtime is 154538.354300 and hence we put this value into the day field. The output as below:

Output
date mtime no size id day order
20100607 154538.354300 200 1 154538.354300 14098703993
20100607 154539.420000 200 1 154538.354300 14098703993
20100607 154538.398200 487 1 154538.398200 14098703994
20100607 154610.720000 487 1 154538.398200 14098703994
20100607 154538.401200 200 1 154538.401200 14098703995
20100607 154539.420000 200 1 154538.401200 14098703995

Please help me guys with the code.. I have been trying for a long time now..!

Thanks,
 
What have you tried? Where are you running into problems?
 
I, like rharsh, don't normally help with no work has been shown. However, here's a quick and dirty script that dose the translation as you describe. You can add the file processing yourself.

Code:
use List::Util qw(max);

use strict;
use warnings;

# do your own file processing here
my $infh = \*DATA;
my $outfh = \*STDOUT;

my @buffer;

while (<$infh>) {
	chomp;
	my @data = split;
	if (!@buffer || $data[-1] eq $buffer[0][-1]) {
		push @buffer, \@data;
	}
	if ($data[-1] ne $buffer[0][-1] || eof($infh)) {
		my $max = max map {$_->[1]} @buffer;
		for (@buffer) {
			$_->[4] = $max;
			print $outfh join(' ', @$_), "\n";
		}
		@buffer = \@data;
	}
}

__DATA__
20100607 154538.354300 200 1 101510 14098703993
20100607 154539.420000 200 1 101511 14098703993
20100607 154538.398200 487 1 100888 14098703994
20100607 154610.720000 487 1 91588 14098703994
20100607 154538.401200 200 1 101502 14098703995
20100607 154539.420000 200 1 101500 14098703995

- Miller
 
Thanks Miller for the code.. Will try it now.Sorry rharsh & miller, I should have posted what I tried:
Thanks all for your reply

reformatted the date as below:

order mtime no size id day date

14098703993 154538.354300 200 1 101510

14098703993 154539.420000 200 1 101511

14098703994 154538.398200 487 1 100888

14098703994 154610.720000 487 1 91588

14098703995 154538.401200 200 1 101502

14098703995 154539.420000 200 1 101500

use List::Util qw(max min);
my %id_hash;
open (DATA, ".txt");
while (<DATA>) { chomp;
my ($order, $mtime, $size, $id, $date) = split /\t/;
push @{ $id_hash{$order}{$id}{mtime} }, $mtime;
push @{ $id_hash{$order}{$id}{size } }, $size;
push @{ $id_hash{$order}{$id}{date } }, $date; }
open (OUT, ">output.txt");
for my $order (keys %id_hash) {
for my $id (keys %{ $id_hash{$order} })
{ my $Low = min( @ { $id_hash{$order}{$id}{mtime} } );
print OUT "$order $Low \n"; } }

Now the problem is this does not give duplicate order entires! I think I am unable to do the below: I want the duplicate order values as it is. Unable to replace the oldest mtime into the date field.

So now I am doing the below stupid code which will be the loooongest code of my life...:

open (OUT, ">output.txt");
open (IN, "input1.txt");->original file1 with all data while($line=<IN>){ chomp($line); ($Date,$MTime,$inserdate,$inserttime,$Id,$Phase,$Size,$day,$order)= split(/ /,$line);
open (INL, "file2.txt");-->contains the sorted order values of order values($x) from file 1
while($linel=<INL>){ chomp($linel);
($x,$y,$z,$q,$t)= split(/ /,$linel);
if($x == $order)
{ print OUT $Date," ",$M_Time," ",$inserdate," ",$y," ",$Id," ",$Pha +se," ",$Size," ",$day," ",$order,"\n"; } } }
close(INL); close(IN); close(OUT);
print "DONE";


Hope it makes sense to you.. am not an expert in Perl.. just trial and error guys.. but now really need ur help...plzzz!
 
From looking at your, as you referred to it, "stupid code" - it would appear that the input file (input1.txt) is formatted differently than the sample data you posted. Can you post a few sample lines from input1.txt so we can see an example of what you're working with?
 
Hi rharsh,

The input1.txt has the below data:
$Date,$MTime,$inserdate,$inserttime,$Id,$Phase,$Size,$day,$order
20100607 093454.666500 20100607 93454 1 263 0 10158 14077252714
20100607 093454.970000 20100607 93454 1 263 0 10158 14077252714
20100607 093454.947500 20100607 93454 1 544 0 10158 14077252715
20100607 093515.850000 20100607 93454 1 544 0 10158 14077252715
20100607 093454.977500 20100607 93454 1 263 0 10158 14077252716
20100607 093500.250000 20100607 93454 1 263 0 10158 14077252716

from this input1.txt, for each unique $order value I calculated the oldest $Mtime and got the file2.txt as below:
$x(order),$y(oldest Mtime),$z(Date),$q(day),$t(Id)
14077252714 093454.666500 20100607 10158 1
14077252715 093454.947500 20100607 10158 1
14077252716 093454.977500 20100607 10158 1


and using the above 2 files: trying to compare the order values, if they are same, put the oldest mtime into the existing mtime field:
if($x == $order)
{ print OUT $Date," ",$M_Time," ",$inserdate," ",$y," ",$Id," ",$Phase," ",$Size," ",$day," ",$order,"\n"; } } }

This code took me almost 2 days as expected! Is there anyother way I could do all in one code..Please help... Thanks
 
Give this a shot. You'll want to uncomment the four lines near the top (and comment out the two below them) if you want to run it with your files. Once that's done, you can remove the __DATA__ line and everything below it as well.

Code:
use warnings;
use strict;

my (%data, $headers, $in_file, $out_file);
# open IN, "< input1.txt" or die "Cannot open input.txt for read";
# open OUT, "> output.txt" or die "Cannot open output.txt for write";
# $in_file = \*IN;
# $out_file = \*OUT;
$in_file = \*DATA;
$out_file = \*STDOUT;

$headers = <$in_file>;
while (<$in_file>) {
	next if m/^\s*$/;
	chomp(my @temp = split);
	push @{$data{$temp[-1]}{records}}, \@temp;
		
	if (defined $data{$temp[-1]}{min_mtime}) {
		$data{$temp[-1]}{min_mtime} = $temp[1] if $data{$temp[-1]}->{min_mtime} > $temp[1];
	} else {
		$data{$temp[-1]}{min_mtime} = $temp[1];
	}
}

print $out_file $headers;
foreach my $id (sort {$a <=> $b} keys %data) {
	foreach my $record (@{$data{$id}{records}}) {
		print $out_file join(' ', map {$_->[4] = $data{$id}->{min_mtime}; @{$_}} [@$record]), "\n";
	}
}

__DATA__
Date MTime inserdate inserttime Id Phase Size day order
20100607 093454.666500 20100607 93454 1 263 0 10158 14077252714
20100607 093454.970000 20100607 93454 1 263 0 10158 14077252714
20100607 093454.947500 20100607 93454 1 544 0 10158 14077252715
20100607 093515.850000 20100607 93454 1 544 0 10158 14077252715
20100607 093454.977500 20100607 93454 1 263 0 10158 14077252716
20100607 093500.250000 20100607 93454 1 263 0 10158 14077252716
 
Oops! the code above replaces the wrong field. Replace the printing code with this:
Code:
print $out_file $headers;
foreach my $id (sort {$a <=> $b} keys %data) {
	foreach my $record (@{$data{$id}{records}}) {
		print $out_file join(' ', map {$_->[[red]-2[/red]] = $data{$id}->{min_mtime}; @{$_}} [@$record]), "\n";
	}
}
 
rharsh said:
Code:
my (%data, $headers, $in_file, $out_file);
# open IN, "< input1.txt" or die "Cannot open input.txt for read";
# open OUT, "> output.txt" or die "Cannot open output.txt for write";
# $in_file = \*IN;
# $out_file = \*OUT;
$in_file = \*DATA;
$out_file = \*STDOUT;

Hey rharsh,

I don't want to get petty since you're just as much of an expert, but I gotta nitpick on your open statements just a little bit for the sake of others.

[ol]
[li]Use the 3 parameter form of open, instead of including the mode within the filename. Also, let read mode '<' be assumed since the extra typing only introduces the possibility that one could mistype it as write mode and overwrite the file.[/li]
[li]Use lexical filehandles within the open statement. You create lexicals after the fact like i did with my aliases for DATA and STDOUT, but it's better to use a lexical from the start.[/li]
[li]Don't forget to examine the error message in $! in your die statement.[/li]
[li]Always put the filename in a variable. This documents the code and also enables one to easily include the filename in the error message if they want[/li]
[li]Finally, your C coding practices are infecting your perl a little bit as you're not limiting the scope of your lexicals but instead declaring them all at the beginning. It's better to only declare lexicals when needed and rely on use strict; use warnings; to let you know if you mask an earlier declaration in the same scope.[/li]
[/ol]

Anyway all that nitpicking just changes the code to the following;

Code:
use warnings;
use strict;

# my $in_file = 'input1.txt';
# my $out_file = 'output.txt';
# open my $in_fh, $in_file or die "Can't open $in_file: $!";
# open my $out_fh, '>', $out_file or die "Can't open $out_file: $!";
my $in_fh = \*DATA;
my $out_fh = \*STDOUT;

my $headers = <$in_fh>;
my %data;

I also renamed the filehandles to $in_fh and $out_fh respectively, which would need to be reflected in the rest of the code. The names are better documented in my opinion.

All of this is probably just a distraction from the original poster's question, but just had to pipe in a little in the hope of encouraging better coding practices early on.

thanks,
- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top