Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sorting Data by Field 5

Status
Not open for further replies.

ljsmith91

Programmer
May 28, 2003
305
0
0
US
I have done a bunch of reading on sorting data and I am still very confused about how to go about it. Maybe thru example of a real problem, I can learn. I have the following file and need to sort the records based on a field. Here is sample data:

server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12 4.003 250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563 6752 boston
server5 01/21/2005 12:33 17.511 5782 dallas

I would like to sort and print these records so that the 4th field is in descending order and the data returns like this:

server3 01/15/2005 23:42 52.105 32033 new york
server1 01/20/2005 10:04 34.321 42133 denver
server5 01/21/2005 12:33 17.511 5782 dallas
server4 01/20/2005 08:31 14.563 6752 boston
server2 01/19/2005 22:12 4.003 250 los angeles

Can you help as I really do not understand the sort function? Quite a departure from Unix's sort.

LJS



 
Code:
open FH, "<data.txt";
my %hash;
while (<FH>) {
   $field4=(split /\s+/,$_)[3]; # get your hash key
   $hash{$field4}=$_;           # set the value for key(4)=whole line
} 
close FH;
foreach $key (sort (keys(%hash))) {
   print "$key \t\t=>\t$hash{$key}\n";
}

using a hash is one way

--Paul

cigless ...
 
PaulTEG,

I see you never use the sort function. I guess that is a good thing ?!? I will try it the hash way. Thanks for your example and help.

-LJS
 
PaulTEG,

I see you never use the sort function. I guess that is a good thing ?!? I will try it the hash way. Thanks for your example and help.

-LJS

but he did use sort:

foreach $key (sort (keys(%hash))) {

although I think that sort is not going to work for this situation since you are sorting numbers, the default sort is by alpha order and will not sort numbers correctly. Just change the line above to:

foreach $key (sort {$b <=> $a} keys %hash ) {

and the file will be sorted in the order you want.

 
it could be done like this:

Code:
#!perl

use warnings;
use strict;

#open FH, "<data.txt";
my @data = <DATA>;
#close FH;

my @index = ();
for (@data) {
   push @index, (split(/\s+/))[3];
}
my @sorted = @data[ sort { $index[$b] <=> $index[$a] } 0 .. $#index ];
print  @sorted;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server3 01/15/2005 23:42 52.105 32033 new york

prints:

Code:
server3 01/15/2005 23:42 52.105 32033 new york
server3 01/15/2005 23:42 52.105 32033 new york
server3 01/15/2005 23:42 52.105 32033 new york
server1 01/20/2005 10:04 34.321 42133 denver
server1 01/20/2005 10:04 34.321 42133 denver
server5 01/21/2005 12:33 17.511  5782 dallas
server5 01/21/2005 12:33 17.511  5782 dallas
server4 01/20/2005 08:31 14.563  6752 boston
server4 01/20/2005 08:31 14.563  6752 boston
server2 01/19/2005 22:12  4.003   250 los angeles
server2 01/19/2005 22:12  4.003   250 los angeles
 
Thanks KevinADC. It seems that sort under PERL is a bit more painful than I ever expected it to be. I am not even sorting on multiple fields and I am feeling the pain of understanding it all. I miss the ease of Unix shell sorting.

I will take advantage of your latest suggestions in code.

Thanks once again. -LJS
 
You're welcome. The problem is not so much the sort function as it is the data format. Text files of this nature need to be broken into discrete parts to sort them. I suspect the same must be true for shell scripting.
 
ljsmith91, if you're using Perl on *nix you can still use *nix shell sort. Example:
Code:
#!perl
use strict;
use warnings;

my $infile = "ljs3.txt";
my @arr = qx(sort +3 -rn $infile);
print for @arr;
Output:
Code:
server3 01/15/2005 23:42 52.105 32033 new york
server1 01/20/2005 10:04 34.321 42133 denver
server5 01/21/2005 12:33 17.511  5782 dallas
server4 01/20/2005 08:31 14.563  6752 boston
server2 01/19/2005 22:12  4.003   250 los angeles
However, I would recommend that you not depend on this and learn to sort in Perl, as Perl's sorting is cross-platform and won't keep you tied to *nix. It will seem a lot less painful with a little more practice.

Here's another way the sort could be done in Perl:
Code:
#!perl
use strict;
use warnings;

my @arr = map {$_->[1]}
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]}
    <DATA>;
    
print for @arr;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
Output same as before.

HTH




 
Using the hash method is OK as long as you can guarantee that none of your records have duplicate values in the sort field. If they do, any subsequent duplicate records overwrite the same hash entry. This makes a hash a convenient way to de-dupe some unsorted input records, but it doesn't really help you in this instance.
 
ljsmith91,

The thing to notice about the codes from KevinADC and MikeVH is that they both used "<=>" in the sort. "sort" by itself is alphanumeric and does not sort numbers correctly. It sorts alphnumerically from the first character on the left to the last char on the right. The "<=>" is a compare that does sort numbers correctly based on there value not character sequence.

Michael Libeson
 
mikevh,

Thanks for the sample sort code. I have a few questions. In your 2nd sample:

Code:
 my @arr = map {$_->[1]}
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]}
    <DATA>;
    
print for @arr;

1) @arr contains my pre- sorted data prior to this map?
2) because this is an array instead of a hash, then duplicate field values will not get lost..right ?
3) What does <DATA> mean at the end ?

I am just a rookie and I am seeking to fully understand. Thanks so much.

To Edgar623,
Code:
a = []
while ARGF.gets do a << $_ end
print a.sort_by{|x| x.split(/\s/)[3]}.reverse.join
Is this PERL code you are using ?

Thanks all.

-LJS



 
mikevh,

Oh...<DATA> is my pre- sorted data and this code is all 1 command....light dawns on marblehead. I get it. Now I just need to figure out exactly what it's doing although I get the concept of it. Thanks again. -LJS
 
<DATA> is a special filehandle in Perl. It's contents are defined at the end of your script:
Code:
#!/usr/bin/perl -w
use strict;

print while (<DATA>); # print my data

__DATA__
Everything here
will be printed
above, as it's contained
in <DATA>

If you're opening a file, you'll have a different filehandle (e.g. PaulTEG's first post uses <FH>) - replace <DATA> with the filehandle you've opened to your file.
 
ljsmith91, as KevinADC points out, the trick I'm using in my sort is called a "Schwartzian Transform," after its inventor Perl guru Randall Schwartz. KevinADC's code is another version of this trick. If you type perldoc -q sort at a command prompt, you'll find the code we posted here, almost verbatim. Paul's hash method is a variation on the same idea, though as noted its drawback is that it will not allow duplicate sort keys.
 
Hello,
I am having a similar problem as ljsmith91. I'm using perl to write to a data file and I want to be able to sort the lines based upon a sort parameter the user defines. The field names are first name, last name, email, school, and so on. For now, a hash is fine for our purposes because we don't have any duplicates, however, as more people use the service, the more likely it is that we're going to have more than one person with the same first name or last name, etc. Could anybody suggest an alternative to the hash, or describe a way in which we could adopt the hash to ignore duplicates?
Thanks
 
There are already two suggestions to your question posted:

one by mikevh

Code:
#!perl
use strict;
use warnings;

my @arr = map {$_->[1]}
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]}
    <DATA>;
    
print for @arr;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas

and one by myslef:

Code:
#!perl

use warnings;
use strict;

#open FH, "<data.txt";
my @data = <DATA>;
#close FH;

my @index = ();
for (@data) {
   push @index, (split(/\s+/))[3];
}
my @sorted = @data[ sort { $index[$b] <=> $index[$a] } 0 .. $#index ];
print  @sorted;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server3 01/15/2005 23:42 52.105 32033 new york

how you secifically apply either of those sort routines to your data depends on the structure of your file, namely how the fields are delimited.


Start a new thread if you decide to continue with your query.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top