sorting in perl 2

gammaman1 · Aug 1, 2005

hi,

I have a file that lists files and filepath such as this:
C:\foldera\file1
C:\folderb\file2
...
C:\folderc\file10
C:\folderd\file11

I'm trying to figure out a way to sort these so that they will be sorted in ascending order. I tried using perl's sort:
@sorted_array = sort { $a <=> $b } @unsorted_array

but it sorts it so that file10 is grouped with file1:
C:\foldera\file1
C:\foldera\file10
C:\foldera\file11
...
C:\foldera\file2

how would i sort this without renaming the files?

thanks,
gammaman

TrojanWarBlade · Aug 2, 2005

Code:

#!/usr/bin/perl -w
use strict;
my @files = qw( C:\foldera\file1
		C:\foldera\file10
		C:\foldera\file6
		C:\foldera\file4
		C:\foldera\file5
		C:\foldera\file2
		C:\foldera\file7
		C:\foldera\file8
		C:\foldera\file9
		C:\foldera\file3
		C:\foldera\file11 );

my @sorted = sort { my ($x)=($a =~ /(\d+)$/);
                    my ($y)=($b =~ /(\d+)$/);
		    return $x <=> $y } @files;

print "@sorted\n";

Try that.

Trojan.

gammaman1 · Aug 4, 2005

thanks for the reply. I tried the code, and it worked if I had the files in the same directory. Is it possible to sort it with different files and different directories? For example, this is the outcome of the above code:
C:\foldera\file1
C:\foldera\file10
C:\foldera\file2
C:\folderbcd\file3
C:\foldera\file4

but I would like to sort it according to directory first, then filename:
C:\foldera\file1
C:\foldera\file2
C:\foldera\file4
C:\foldera\file10
C:\folderbcd\file3

is this possible?

thanks,
gammaman

TrojanWarBlade · Aug 4, 2005

Try this then:

Code:

#!/usr/bin/perl -w
use strict;
my @files = qw( C:\foldera\file1
		C:\foldera\file10
		C:\foldera\file6
		C:\foldera\file4
		C:\foldera\file5
		C:\folderb\file2
		C:\foldera\file7
		C:\foldera\file8
		C:\folderb\file9
		C:\folderb\file3
		C:\foldera\file11 );

my @sorted = sort { my ($s,$x)=($a =~ /\\([^\\]+)\\\D+(\d+)$/);
                    my ($t,$y)=($b =~ /\\([^\\]+)\\\D+(\d+)$/);
		    if($s eq $t) { return $x <=> $y; }
		    else { return $s cmp $t; } } @files;

print "@sorted\n";

Trojan.

Xaqte · Aug 4, 2005

Sorry to but in! Trojan, I'm in the midst of reading Recipe 4.15 of the cookbook when I found this thread. Is there any way you could break this down and explain this a little bit?

Thanks,

X

TrojanWarBlade · Aug 4, 2005

LOL
ok, matey, sorry. I should have been more careful not to write gibberish! ;-)
I hope the array allocation is obvious.
The "my @sorted" line is the one that does all the work, obviously, so the explanation will be purely about that.
The "@files" on the end is the array that we're sorting and the "sort" on the front is the sort function so now we're left with the meat between the first opening curly brace and the last closing curly brace.
Since it's a block, we can scope our variables with "my" to avoid them trading on the toes of anything else.
There are two regexs to setup $s, $x, $t and $y. The sort function provides two values to compare (too see if they need swapping) $a and $b). They are the focus of the regex processing.
Each reges is identical, one works on $a and the other, $b.
The "$a =~" binds the reges to "$a" (in the first case and, as you can see the same applies to "$b" in the second).
The regex is overcomplicated by the fact that this is for windoze and windoze uses a backslash as the path separator. All backslashes in regexs need to be escaped with another backslash and that's why there's always two at a time.
So now then, we are matching a backslash "\\" followed by one or more non backslash characters "[^\\]+" which we capture using parentheses "()" so that it will be assigend to the first scoped variable (in this case $s).
Next we match another backslash "\\" along with one or more non digit characters. When this is matched, it's all just thrown away because we don't need it. Finally we match one or more digits and capture that (to assign to $x), right up to the end (forced by "$" which matches the end of string). The "$" ensures that we match this sequence where we expect it (at the end of the string) and not earlier in a very long path.
Phew!
By this time, we have the filename number in $x and the directory name (folder name for windoze guys) in $s.
Once this is repeated for the other path ($b) we can compare file numbers and directory names.
The last step then is relatively simple. If we are in the same directory ($s eq $t) then we can simply return whether to sort based on file number only. We do this with "return $x <=> $y" which does a numeric test and flags which is the greater.
If the directories are different, we compare those with a text comparison (cmp) and again, return a flag for which is greater.

So ends the lesson for how my gibberish works! ;-)

I hope that explains it a little. It certainly killed my fingers typing all that explanation!

Trojan.

justice41 · Aug 4, 2005

This type of sorting is well suited for the Schwartzian transform. In this way, the regex is only run once for each item in the array rather that once for each comparison (N vs log(N))

Code:

my @sorted = map { $_->[0] }
            sort { $a->[1] cmp $b->[1]
                or $a->[2] <=> $b->[2] }
            map { [$_, /\\([^\\]+)\\\D+(\d+)$/] }
            @files;

If file list is long, this can boost the efficiency.

jaa

Xaqte · Aug 4, 2005

Code:

$Trojan{$gibberish} = "Genius"

Thanks for the lesson, Trojan! A star for you!

X

TrojanWarBlade · Aug 4, 2005

Nice trick Justice41,
I didn't stop to think about that.

Trojan.

TrojanWarBlade · Aug 4, 2005

I beginning to think that I'm getting too old for this game!
A few years ago I doubt I would have missed that, especially the short circuit operators, one of my favourites!
:-(

Trojan.

justice41 · Aug 4, 2005

I think the idiosyncratic tips-n-tricks of a programming language are the first to go. Especially if you switch between languages often.

BTW, the algorithmic efficiency of the sort in my previous post should have been Nlog(N) rather than N. (Oh, if only it were log(N) .. Sorting faster than you can count!)

jaa

TrojanWarBlade · Aug 4, 2005

How's about this for an old timer then?

Code:

my @sorted = map  { $_->[0] }
             sort { $a->[1] cmp $b->[1] }
             map  { [ $_, sprintf("%s%04d", /^(.*\D)(\d+)$/) ] } @files;

;-)

Trojan.

TrojanWarBlade · Aug 4, 2005

Maybe we should setup a school of excellence where we needlessly tune everything to the nth degree!

;-)

Trojam.

TrojanWarBlade · Aug 4, 2005

I'm not sure if this will be any faster.
The regex is a little simpler and there is only one comparison instead of two but the sprintf could kill it.

Trojan.

justice41 · Aug 4, 2005

> Maybe we should setup a school of excellence where we needlessly tune everything to the nth degree!

What I find interesting is that rarely do people posting here complain that their code runs too slow or seems to take forever, only that it doesn't work. when I was weaned on Perl, almost every counterpost of code to comp.lang.perl.misc was benchmarked ( use Benchmark; ) even if it didn't need to be.

I think computers are fast enough currently that algorithmic efficiency isn't a concern anymore. Ah, well.

jaa

TrojanWarBlade · Aug 5, 2005

Justice41,
I agree with you. I think it's something that should be considered at all times. It might not be relevant to any one instance but if you dont get into the habit then one day it'll bite you and hurt!

Trojan.

bluegroper · Aug 5, 2005

Fantastic.
I gotta clip and keep this thread in my "Tutorials" file.

-BG

fishiface · Aug 6, 2005

footnote; since about DOS 2, you can use forward-slashes instead of back-slashes for paths. Judicious use can keep regexes more manageable and code more portable.

f

"As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs."
--Maurice Wilkes

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

sorting in perl 2

Programmer

Programmer

Programmer

Programmer

IS-IT--Management

Programmer

Programmer

IS-IT--Management

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Technical User

IS-IT--Management

Similar threads

Log in

Part and Inventory Search

Sponsor