perl array help

subtelo · Jul 10, 2008

Is there any easy code I can do the following task. Thanks.
From the file 1, collect 3 highest score of each id then output to the file 2.

File 1:
id score
hs_ref_chr1 1199
hs_ref_chr1 910
hs_ref_chr1 781
hs_ref_chr1 434
hs_ref_chr1 511
hs_ref_chr1 638
hs_ref_chr1 458
hs_ref_chr1 335
hs_ref_chr10 398
hs_ref_chr10 1106
hs_ref_chr16 272
hs_ref_chr16 1084
hs_ref_chr4 1162
hs_ref_chr4 1096
hs_ref_chr4 1090
hs_ref_chr4 1120
hs_ref_chr4 1701
hs_ref_chr4 1120
hs_ref_chr4 1152

File 2:
hs_ref_chr1 1199
hs_ref_chr1 910
hs_ref_chr1 781
hs_ref_chr10 1106
hs_ref_chr10 398
hs_ref_chr16 1084
hs_ref_chr16 272
hs_ref_chr4 1701
hs_ref_chr4 1162
hs_ref_chr4 1152

Thanks a lot.

KevinADC · Jul 10, 2008

basic concept:

Code:

[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%ID[/blue] = [red]([/red][red])[/red][red];[/red]
[black][b]my[/b][/black] [blue]@order[/blue] = [red]([/red][red])[/red][red];[/red]
[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
   [black][b]my[/b][/black] [red]([/red][blue]$id[/blue], [blue]$pts[/blue][red])[/red] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red];[/red]
   [url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@order[/blue],[blue]$id[/blue] [olive][b]unless[/b][/olive] [url=http://perldoc.perl.org/functions/exists.html][black][b]exists[/b][/black][/url] [blue]$ID[/blue][red]{[/red][blue]$id[/blue][red]}[/red][red];[/red] 
   [black][b]push[/b][/black] [blue]@[/blue][red]{[/red][blue]$ID[/blue][red]{[/red][blue]$id[/blue][red]}[/red][red]}[/red],[blue]$pts[/blue][red];[/red]
[red]}[/red]
[olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$id[/blue] [red]([/red][blue]@order[/blue][red])[/red] [red]{[/red]
   [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$id[/blue] [blue]$_[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red] [olive][b]for[/b][/olive] [red]([/red] [url=http://perldoc.perl.org/functions/sort.html][black][b]sort[/b][/black][/url] [red]{[/red][blue]$b[/blue] <=> [blue]$a[/blue][red]}[/red] [blue]@[/blue][red]{[/red] [blue]$ID[/blue][red]{[/red][blue]$id[/blue][red]}[/red] [red]}[/red] [red])[/red] [red][[/red][fuchsia]0[/fuchsia],[fuchsia]1[/fuchsia],[fuchsia]2[/fuchsia][red]][/red][red];[/red]
[red]}[/red]	
[teal]__DATA__[/teal]
[teal]hs_ref_chr1    1199[/teal]
[teal]hs_ref_chr1    910[/teal]
[teal]hs_ref_chr1    781[/teal]
[teal]hs_ref_chr1    434[/teal]
[teal]hs_ref_chr1    511[/teal]
[teal]hs_ref_chr1    638[/teal]
[teal]hs_ref_chr1    458[/teal]
[teal]hs_ref_chr1    335[/teal]
[teal]hs_ref_chr10    398[/teal]
[teal]hs_ref_chr10    1106[/teal]
[teal]hs_ref_chr16    272[/teal]
[teal]hs_ref_chr16    1084[/teal]
[teal]hs_ref_chr4    1162[/teal]
[teal]hs_ref_chr4    1096[/teal]
[teal]hs_ref_chr4    1090[/teal]
[teal]hs_ref_chr4    1120[/teal]
[teal]hs_ref_chr4    1701[/teal]
[teal]hs_ref_chr4    1120[/teal]
[teal]hs_ref_chr4    1152[/teal]

[tt]------------------------------------------------------------
Pragmas (perl 5.10.0) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]
I leave the file IO and the warnings the above code will generate up to you to resolve. Next time it would be nice to see some effort on your part to solve the problem before posting a question.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

stevexff · Jul 10, 2008

This looks like homework...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

subtelo · Jul 11, 2008

Thank you KevinADC for you perfect code.

Sorry I mislead you guys. Actually the file is much more complicated. Each line of the file contains 12 element. What I need is the 3 highest score(last element) of each id (first element), and output the original line.
It is very easy to do it in Excel spreadsheet (first sort column A, then sort column L). But I can not figure it out in Perl.

KevinADC's code is perfect for 2 elements in a line, but it will not work if there is more than 2 elements in a line and the output file need the original line. It looks like need a scale to carry the whole line, it also need to associate with "score" and "id". The problem is that both "id" and "score" are not unique in the file, you can not use them as key. So... any suggestion?

Steve, I do not think instructor will assign such tough homework. Otherwise nobody will take his/her class.

The original file (I only include 4 elements of each line here for example).

id percent length score
hs_ref_chr1 99.51 406 1199
hs_ref_chr1 100 223 910
hs_ref_chr1 100 152 781
hs_ref_chr1 98.77 326 434
hs_ref_chr1 99.55 222 511
hs_ref_chr1 100 339 638
hs_ref_chr1 100 587 458
hs_ref_chr1 99.68 620 335
hs_ref_chr10 100 588 398
hs_ref_chr10 99.71 679 1106
hs_ref_chr16 99.81 537 272
hs_ref_chr16 98.5 535 1084
hs_ref_chr4 98.91 552 1162
hs_ref_chr4 100 672 1096
hs_ref_chr4 99.58 473 1090
hs_ref_chr4 98.52 405 1120
hs_ref_chr4 99.7 676 1701
hs_ref_chr4 99.84 607 1120
hs_ref_chr4 99.7 676 1152

---------------------
And the output should be:

id percent length score
hs_ref_chr1 99.51 406 1199
hs_ref_chr1 100 223 910
hs_ref_chr1 100 152 781
hs_ref_chr10 99.71 679 1106
hs_ref_chr10 100 588 398
hs_ref_chr16 98.5 535 1084
hs_ref_chr16 99.81 537 272
hs_ref_chr4 99.7 676 1701
hs_ref_chr4 98.91 552 1162
hs_ref_chr4 99.7 676 1152

Hope I explain my problem clearly this time.

Thanks.

PinkeyNBrain · Jul 11, 2008

For a tough file like this, you can use a splice across the split then a join to get your val to sort on.

rharsh · Jul 11, 2008

I do agree with Kevin, it would be nice to see some effort on your part (some code, even if it doesn't work - not just an explanation of the problem.) But... give this a shot.

Code:

my $headers = <DATA>;
my (@order, %data);

while (<DATA>) {
	my @temp = split;
	push (@order, $temp[0]) unless $data{$temp[0]};
	push @{$data{$temp[0]}}, [$temp[0], $temp[3], $_];
}

print $headers;
foreach my $id (@order) {
	@{$data{$id}} = sort {$b->[0] cmp $a->[0] || $b->[1] <=> $a->[1]} @{$data{$id}};
	foreach my $i (0..2) {
		if (exists $data{$id}->[$i]) {
			print $data{$id}->[$i]->[2];
		}
	}
}

KevinADC · Jul 11, 2008

Writing code for data that changes on a whim is futile.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

subtelo · Jul 14, 2008

Since i can not solve this problem by perl only. So what I did is input the original file into Excel, first sort by "id" ascending then sort by "score" descending.
After output the file from Excel, then only output the first 3 lines of each id.

my $in = "sorted_in.txt";
my $out = "sorted_out.txt";
my (%ids, %lines);

open (IN, "$in") || die "can not open file $in to read.\n";
my $header = <IN>;

while(<IN>)
{
chomp;
my @aa = split (/\t/, $_);

$ids{$aa[0]} = $aa[0] unless $ids{$aa[0]};
if ($#{$lines{$aa[0]}} < 2)
{
push (@{$lines{$aa[0]}}, $_);
}
}
close IN;

open (OUT, ">$out") || die "can not open fiel $out to write.\n";

print OUT $header;
foreach my $id (sort keys %ids)
{
foreach my $i (0..$#{$lines{$id}})
{
print OUT "$lines{$id}[$i]\n";
}
}
close OUT;

subtelo · Jul 14, 2008

rharsh, thanks for the code. I will give it a try and post here what looks like.
Thanks a lot.

subtelo · Jul 14, 2008

rharsh, your code works perfectly. Sorting on several elements is what exactly I want. Thank you very much.

And thank all of you guys for your inputs.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

perl array help

subtelo

Programmer

KevinADC

Technical User

stevexff

Programmer

subtelo

Programmer

PinkeyNBrain

IS-IT--Management

rharsh

Technical User

KevinADC

Technical User

subtelo

Programmer

subtelo

Programmer

subtelo

Programmer

Similar threads

Part and Inventory Search

Sponsor