Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

using Schwartzian Transform to sort a data file by fields 2

Status
Not open for further replies.

ysmith

Programmer
Feb 17, 2005
7
0
0
US
Hello,
I'm still new at Perl and I'm trying to understand how to effectively implement the Schwartzian Transform on a data file. Here's what I'm trying to do
Code:
my $test_file = "../data/data_file";
open TEST, "<$test_file";
my @test_sort = <TEST>; 
close TEST; 
 
my @arr = map {$_->[1]} 
    sort {$b->[0] <=> $a->[0]}
    map {[ (split)[3], $_ ]} 
    @test_sort; 
 
print for @arr;
which, as discussed on a previous thread, is an implementation of the Schwartzian Transform. The way I understand it, the file is opened and read into an array, then the array is sorted by the fourth field which is stored then printed. I've been tinkering with the values a bit, but it's still not sorting the way I'd hoped. Am I doing something wrong?
Thanks in advance
 
The way I understand it, the file is opened and read into an array
In the code you posted, yes, you're reading the file into @test_sort, but you don't need this intermediate array, unless you also need the original unsorted version of the file. You could just put the fh and read operator <TEST> where you have @test_sort.
it's still not sorting the way I'd hoped. Am I doing something wrong?
Couldn't say unless I saw the file and how you want it sorted. Post a representative sample of the file and show us how you want it sorted.

 
mikevh, here's a sample of the unsorted file:
John, Doe, jdoe@rl.af.mil, Bethel College, Public Affairs, 56RF, 1984, History/German
Jane, Simmons, , Saint Mary's College, Medical Corps, 45MC, 1999, Nursing/Business
Joe, Schmoe, jschmo@nd.edu, University of Notre Dame, Medical Corps, 34MC, 2004, Pre-Med

I'd like it sorted by the fourth field (school), but when I run the sort function, it just spits out the same data that it read in, unsorted.
Let me know if you need more info
 
you're pretty close but need to make a couple of changes for this to work properly:

the sort function needs to use cmp (alpha sort) instead of <=> (numeric sort)

and the split function needs to be "told" to split the lines on the commas: split(/,/) instead of the default which is a space I believe in the abscene of an pattern (which still might have coincidentally worked but it makes sense to split on the commas since its a comma delimited file).

Code:
my $test_file = "../data/data_file";
open TEST, "<$test_file";
my @test_sort = <TEST>;
close TEST;
 
my @arr = map {$_->[1]}
    sort {$b->[0] [b]cmp[/b] $a->[0]}
    map {[ (split[b](/,/)[/b])[3], $_ ]}
    @test_sort;
 
print [b]"$_\n"[/b] for @arr;

Note that this sorts in descending order:

$b->[0] cmp $a->[0]

to sort in ascending order swap $a and $b:

$a->[0] cmp $b->[0]

and as Mike said, you can just use the filehandle <TEST> to create the sorted array: @arr
 
The problem's in the split() and the comparsion operator.

split() with no arguments splits on whitespace. Your fields are delimited by commas followed by whitespace, so we need to provide an argument to split() (a regular expression) to indicate this.

<=> is for comparing numbers. cmp is for comparing strings. The fourth field (school) is a string.

Like so:
Code:
#!perl
use strict;
use warnings;

my @arr = map {$_->[1]} 
    sort {$b->[0] [b]cmp[/b] $a->[0]}
    map {[ (split [b]/,\s+/[/b])[3], $_ ]} 
    <DATA>;
 
print for @arr;

__DATA__
John, Doe, jdoe@rl.af.mil, Bethel College, Public Affairs, 56RF, 1984, History/German
Jane, Simmons,  , Saint Mary's College, Medical Corps, 45MC, 1999, Nursing/Business
Joe, Schmoe, jschmo@nd.edu, University of Notre Dame, Medical Corps, 34MC, 2004, Pre-Med
Output:
Code:
Joe, Schmoe, jschmo@nd.edu, University of Notre Dame, Medical Corps, 34MC, 2004, Pre-Med
Jane, Simmons,  , Saint Mary's College, Medical Corps, 45MC, 1999, Nursing/Business
John, Doe, jdoe@rl.af.mil, Bethel College, Public Affairs, 56RF, 1984, History/German
Note that the data is now sorted in descending order by school. To sort in ascending order, you'd say $a->[0] cmp $b->[0] instead of $b->[0] cmp $a->[0]. (The data was already in ascending order by school.)

HTH





 
but the whitespace will not affect the sorting of the lines, so splitting on just commas should be fine, no?
 
True, it doesn't affect the sorting. I guess it just makes me uncomfortable to think of the fields starting with whitespace. And if we were going to store the fields in an array, say, I really don't think we'd want the blanks there. It's just neater. But as you say, getting rid of the blanks is not necessary in this case.

 
Thanks KevinADC and mikevh! It works perfectly!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top