Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sorting list problems

Status
Not open for further replies.

smokinace

Programmer
Jun 12, 2008
18
US
I am having problems sorting a fairly simple list. Here is the list:

Code:
>NM_001100917.1	>NM_001100917.1	1019	100	1019	1	1	
>NM_020388.3	>NM_001100917.1	0	0	0	.	.	
>NM_001001395.1	>NM_001100917.1	18	95	20	647	2404	
>NM_001048210.1	>NM_001100917.1	0	0	0	.	.	
>NM_033157.2	>NM_001100917.1	18	95	20	889	2961	
>NM_199229.1	>NM_001100917.1	18	100	18	903	1938	
>NM_001077637.1	>NM_001100917.1	0	0	0	.	.	
>NM_152449.2	>NM_001100917.1	0	0	0	.	.	
>NM_006916.1	>NM_001100917.1	18	100	18	903	1941	
>NM_033481.3	>NM_001100917.1	0	0	0	.	.	
>NM_033140.2	>NM_001100917.1	18	95	20	889	2619	
>NM_033138.2	>NM_001100917.1	18	95	20	889	3648	
>NM_007039.3	>NM_001100917.1	0	0	0	.	.	
>NM_033480.2	>NM_001100917.1	0	0	0	.	.	
>NM_004342.5	>NM_001100917.1	18	95	20	889	2883	
>NM_033139.2	>NM_001100917.1	18	95	20	889	2697	
>NM_013296.4	>NM_001100917.1	18	95	20	217	2875
It needs to be sorted by the fourth and fifth columns and will only print an entry to a file if column 5 is greater than or equal to 25 and column 4 is greater than or equal to 95. here is what i have so far.

Code:
use warnings;

$infile = <STDIN>;
$outfile = <STDIN>;

open (IN, $infile);
open (OUT, ">>$outfile");

while(<IN>) {
	
	@comp = split (/\t/, $_);
	if ($comp[4] >= 25 && $comp[3] >= 95) {
		
		print OUT "@comp\n";
		
	}
}
i have no idea why it is not working. Any help would be great.
 
You should add some print statements to debug your code. Print out $comp[3] and $comp[4] before your if statement. When I tested your code I had to change \t to \s+ to get it to work properly.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
i realized what I did. I used STDIN instead of ARGV like I wanted to use. It kept wanting parameters but I didn't give it any. Sorry to waste your time with my error. Thanks for the help anyways!
 
i have another problem now involving making a list of specific names taken from the sorted through list already. I now need to add the first element in each row into an array, but that entry is only put in once and won't be repeated. Here is what I am thinking

Code:
open (NEW, "$outfile");
open(UN, ">>$outfile" . "list.txt");

@list = ();
@match = ();
while (<NEW>) {
	
	@comp = split (/\t/, $_);
	@match = grep (/"$comp[0]"/, @list);
	if (@match = ()) {
		push (@list, $comp[0]);
	}
}

print UN "@list\n";
I don't think i am using grep correctly because nothing prints to the file.
 
Use a hash.

my %felement;
while (<NEW>){
@comp = split (/\t/, $_);
$felement{$comp[0]} = 0;
}

for my $key (sort keys %felement) {
print "$key\n";
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
that worked great thanks. Is it because hashes can't contain two of the same key so it just doesn't include repeats when printed?
 
Correct.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Also.. as a side note, you should learn about strict and start using it. It is a pain to begin with, but you'll be glad you did it in the long run.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
why is strict so good. I started using it but stopped because it wouldn't run a program because i didn't have a specific quote that wasn't even needed. Is it just to help proofread? Couldn't I just turn it on during debugging?
 
Another thing to consider, you will lose the original order of the data using a hash. May not be a problem this time but you have to keep in mind that hashes will not be in any guaranteed order like an array. Here is a way to maintain order, basic concept:

Code:
my @list;
my %seen;
while (<DATA>) {
    $comp = (split (/\s/))[0];
    next if $seen{$comp}++;
    push (@list, $comp)
}

print "$_\n" for @list; 
__DATA__
pig man fish
cat dog fish
fish cow bird
cat horse man
fish man pig
pig dog man
horse cow man
man fish pig

This is indeed incorrect usage:

Code:
@match = grep (/"$comp[0]"/, @list);

the double-quotes are messing up the search patternm they should be removed. But it is also checking for substrings and has the potential to return false matches. For example, it will match "tek" in "tek-tips", or any other substring that is part of the string.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
kevin that is a great suggestion. So it would match NM_0010989 if I really wanted NM_0012546 because it would queu off of the NM_001? Is there a literal I can use for the hash like I can for strings?
 
Strict will keep you from making all kinds of simple mistakes. If it required the quote I would have to say it was really needed or you had some other error that strict felt that was the best way to fix it. Here's a bit about it on perlmonk

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
kevin that is a great suggestion. So it would match NM_0010989 if I really wanted NM_0012546 because it would queu off of the NM_001? Is there a literal I can use for the hash like I can for strings?

if the search pattern was NM_001 it would return both of those strings. If you want exact matches beginning to end you must add the string anchors ^ and $:

/^pattern$/

or use 'eq' instead.



------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top