Sorting Data by Field 5

ljsmith91 · Feb 23, 2005

I have done a bunch of reading on sorting data and I am still very confused about how to go about it. Maybe thru example of a real problem, I can learn. I have the following file and need to sort the records based on a field. Here is sample data:

server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12 4.003 250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563 6752 boston
server5 01/21/2005 12:33 17.511 5782 dallas

I would like to sort and print these records so that the 4th field is in descending order and the data returns like this:

server3 01/15/2005 23:42 52.105 32033 new york
server1 01/20/2005 10:04 34.321 42133 denver
server5 01/21/2005 12:33 17.511 5782 dallas
server4 01/20/2005 08:31 14.563 6752 boston
server2 01/19/2005 22:12 4.003 250 los angeles

Can you help as I really do not understand the sort function? Quite a departure from Unix's sort.

LJS

PaulTEG · Feb 23, 2005

Code:

open FH, "<data.txt";
my %hash;
while (<FH>) {
   $field4=(split /\s+/,$_)[3]; # get your hash key
   $hash{$field4}=$_;           # set the value for key(4)=whole line
} 
close FH;
foreach $key (sort (keys(%hash))) {
   print "$key \t\t=>\t$hash{$key}\n";
}

using a hash is one way

--Paul

cigless ...

ljsmith91 · Feb 23, 2005

PaulTEG,

I see you never use the sort function. I guess that is a good thing ?!? I will try it the hash way. Thanks for your example and help.

-LJS

KevinADC · Feb 23, 2005

PaulTEG,

I see you never use the sort function. I guess that is a good thing ?!? I will try it the hash way. Thanks for your example and help.

-LJS

but he did use sort:

foreach $key (sort (keys(%hash))) {

although I think that sort is not going to work for this situation since you are sorting numbers, the default sort is by alpha order and will not sort numbers correctly. Just change the line above to:

foreach $key (sort {$b <=> $a} keys %hash ) {

and the file will be sorted in the order you want.

KevinADC · Feb 23, 2005

it could be done like this:

Code:

#!perl

use warnings;
use strict;

#open FH, "<data.txt";
my @data = <DATA>;
#close FH;

my @index = ();
for (@data) {
   push @index, (split(/\s+/))[3];
}
my @sorted = @data[ sort { $index[$b] <=> $index[$a] } 0 .. $#index ];
print  @sorted;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server3 01/15/2005 23:42 52.105 32033 new york

prints:

Code:

server3 01/15/2005 23:42 52.105 32033 new york
server3 01/15/2005 23:42 52.105 32033 new york
server3 01/15/2005 23:42 52.105 32033 new york
server1 01/20/2005 10:04 34.321 42133 denver
server1 01/20/2005 10:04 34.321 42133 denver
server5 01/21/2005 12:33 17.511  5782 dallas
server5 01/21/2005 12:33 17.511  5782 dallas
server4 01/20/2005 08:31 14.563  6752 boston
server4 01/20/2005 08:31 14.563  6752 boston
server2 01/19/2005 22:12  4.003   250 los angeles
server2 01/19/2005 22:12  4.003   250 los angeles

ljsmith91 · Feb 23, 2005

Thanks KevinADC. It seems that sort under PERL is a bit more painful than I ever expected it to be. I am not even sorting on multiple fields and I am feeling the pain of understanding it all. I miss the ease of Unix shell sorting.

I will take advantage of your latest suggestions in code.

Thanks once again. -LJS

KevinADC · Feb 23, 2005

You're welcome. The problem is not so much the sort function as it is the data format. Text files of this nature need to be broken into discrete parts to sort them. I suspect the same must be true for shell scripting.

mikevh · Feb 23, 2005

ljsmith91, if you're using Perl on *nix you can still use *nix shell sort. Example:

Code:

#!perl
use strict;
use warnings;

my $infile = "ljs3.txt";
my @arr = qx(sort +3 -rn $infile);
print for @arr;

Output:

Code:

server3 01/15/2005 23:42 52.105 32033 new york
server1 01/20/2005 10:04 34.321 42133 denver
server5 01/21/2005 12:33 17.511  5782 dallas
server4 01/20/2005 08:31 14.563  6752 boston
server2 01/19/2005 22:12  4.003   250 los angeles

However, I would recommend that you not depend on this and learn to sort in Perl, as Perl's sorting is cross-platform and won't keep you tied to *nix. It will seem a lot less painful with a little more practice.

Here's another way the sort could be done in Perl:

Code:

#!perl
use strict;
use warnings;

my @arr = map {$_->[1]}
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]}
    <DATA>;
    
print for @arr;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas

Output same as before.

HTH

KevinADC · Feb 23, 2005

ahh! the infamous Schwartzian Transform! [medal]

stevexff · Feb 23, 2005

Using the hash method is OK as long as you can guarantee that none of your records have duplicate values in the sort field. If they do, any subsequent duplicate records overwrite the same hash entry. This makes a hash a convenient way to de-dupe some unsorted input records, but it doesn't really help you in this instance.

mlibeson · Feb 24, 2005

ljsmith91,

The thing to notice about the codes from KevinADC and MikeVH is that they both used "<=>" in the sort. "sort" by itself is alphanumeric and does not sort numbers correctly. It sorts alphnumerically from the first character on the left to the last char on the right. The "<=>" is a compare that does sort numbers correctly based on there value not character sequence.

Michael Libeson

ljsmith91 · Feb 24, 2005

mikevh,

Thanks for the sample sort code. I have a few questions. In your 2nd sample:

Code:

 my @arr = map {$_->[1]}
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]}
    <DATA>;
    
print for @arr;

1) @arr contains my pre- sorted data prior to this map?
2) because this is an array instead of a hash, then duplicate field values will not get lost..right ?
3) What does <DATA> mean at the end ?

I am just a rookie and I am seeking to fully understand. Thanks so much.

To Edgar623,

Code:

a = []
while ARGF.gets do a << $_ end
print a.sort_by{|x| x.split(/\s/)[3]}.reverse.join

Is this PERL code you are using ?

Thanks all.

-LJS

ljsmith91 · Feb 24, 2005

mikevh,

Oh...<DATA> is my pre- sorted data and this code is all 1 command....light dawns on marblehead. I get it. Now I just need to figure out exactly what it's doing although I get the concept of it. Thanks again. -LJS

ishnid · Feb 24, 2005

<DATA> is a special filehandle in Perl. It's contents are defined at the end of your script:

Code:

#!/usr/bin/perl -w
use strict;

print while (<DATA>); # print my data

__DATA__
Everything here
will be printed
above, as it's contained
in <DATA>

If you're opening a file, you'll have a different filehandle (e.g. PaulTEG's first post uses <FH>) - replace <DATA> with the filehandle you've opened to your file.

mikevh · Feb 24, 2005

ljsmith91, as KevinADC points out, the trick I'm using in my sort is called a "Schwartzian Transform," after its inventor Perl guru Randall Schwartz. KevinADC's code is another version of this trick. If you type perldoc -q sort at a command prompt, you'll find the code we posted here, almost verbatim. Paul's hash method is a variation on the same idea, though as noted its drawback is that it will not allow duplicate sort keys.

ysmith · Mar 9, 2005

Hello,
I am having a similar problem as ljsmith91. I'm using perl to write to a data file and I want to be able to sort the lines based upon a sort parameter the user defines. The field names are first name, last name, email, school, and so on. For now, a hash is fine for our purposes because we don't have any duplicates, however, as more people use the service, the more likely it is that we're going to have more than one person with the same first name or last name, etc. Could anybody suggest an alternative to the hash, or describe a way in which we could adopt the hash to ignore duplicates?
Thanks

KevinADC · Mar 9, 2005

There are already two suggestions to your question posted:

one by mikevh

Code:

#!perl
use strict;
use warnings;

my @arr = map {$_->[1]}
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]}
    <DATA>;
    
print for @arr;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas

and one by myslef:

Code:

#!perl

use warnings;
use strict;

#open FH, "<data.txt";
my @data = <DATA>;
#close FH;

my @index = ();
for (@data) {
   push @index, (split(/\s+/))[3];
}
my @sorted = @data[ sort { $index[$b] <=> $index[$a] } 0 .. $#index ];
print  @sorted;

__DATA__
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server1 01/20/2005 10:04 34.321 42133 denver
server2 01/19/2005 22:12  4.003   250 los angeles
server3 01/15/2005 23:42 52.105 32033 new york
server4 01/20/2005 08:31 14.563  6752 boston
server5 01/21/2005 12:33 17.511  5782 dallas
server3 01/15/2005 23:42 52.105 32033 new york

how you secifically apply either of those sort routines to your data depends on the structure of your file, namely how the fields are delimited.

Start a new thread if you decide to continue with your query.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Sorting Data by Field 5

ljsmith91

Programmer

PaulTEG

Technical User

ljsmith91

Programmer

KevinADC

Technical User

KevinADC

Technical User

ljsmith91

Programmer

KevinADC

Technical User

mikevh

Programmer

KevinADC

Technical User

stevexff

Programmer

mlibeson

Programmer

ljsmith91

Programmer

ljsmith91

Programmer

ishnid

Programmer

mikevh

Programmer

ysmith

Programmer

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor