Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

sorting in perl 2

Status
Not open for further replies.

gammaman1

Programmer
Jun 30, 2005
23
US
hi,

I have a file that lists files and filepath such as this:
C:\foldera\file1
C:\folderb\file2
...
C:\folderc\file10
C:\folderd\file11

I'm trying to figure out a way to sort these so that they will be sorted in ascending order. I tried using perl's sort:
@sorted_array = sort { $a <=> $b } @unsorted_array

but it sorts it so that file10 is grouped with file1:
C:\foldera\file1
C:\foldera\file10
C:\foldera\file11
...
C:\foldera\file2

how would i sort this without renaming the files?

thanks,
gammaman



 
Code:
#!/usr/bin/perl -w
use strict;
my @files = qw( C:\foldera\file1
		C:\foldera\file10
		C:\foldera\file6
		C:\foldera\file4
		C:\foldera\file5
		C:\foldera\file2
		C:\foldera\file7
		C:\foldera\file8
		C:\foldera\file9
		C:\foldera\file3
		C:\foldera\file11 );

my @sorted = sort { my ($x)=($a =~ /(\d+)$/);
                    my ($y)=($b =~ /(\d+)$/);
		    return $x <=> $y } @files;

print "@sorted\n";

Try that.


Trojan.
 
thanks for the reply. I tried the code, and it worked if I had the files in the same directory. Is it possible to sort it with different files and different directories? For example, this is the outcome of the above code:
C:\foldera\file1
C:\foldera\file10
C:\foldera\file2
C:\folderbcd\file3
C:\foldera\file4

but I would like to sort it according to directory first, then filename:
C:\foldera\file1
C:\foldera\file2
C:\foldera\file4
C:\foldera\file10
C:\folderbcd\file3

is this possible?

thanks,
gammaman
 
Try this then:
Code:
#!/usr/bin/perl -w
use strict;
my @files = qw( C:\foldera\file1
		C:\foldera\file10
		C:\foldera\file6
		C:\foldera\file4
		C:\foldera\file5
		C:\folderb\file2
		C:\foldera\file7
		C:\foldera\file8
		C:\folderb\file9
		C:\folderb\file3
		C:\foldera\file11 );

my @sorted = sort { my ($s,$x)=($a =~ /\\([^\\]+)\\\D+(\d+)$/);
                    my ($t,$y)=($b =~ /\\([^\\]+)\\\D+(\d+)$/);
		    if($s eq $t) { return $x <=> $y; }
		    else { return $s cmp $t; } } @files;

print "@sorted\n";


Trojan.
 
Sorry to but in! Trojan, I'm in the midst of reading Recipe 4.15 of the cookbook when I found this thread. Is there any way you could break this down and explain this a little bit?

Thanks,

X
 
LOL
ok, matey, sorry. I should have been more careful not to write gibberish! ;-)
I hope the array allocation is obvious.
The "my @sorted" line is the one that does all the work, obviously, so the explanation will be purely about that.
The "@files" on the end is the array that we're sorting and the "sort" on the front is the sort function so now we're left with the meat between the first opening curly brace and the last closing curly brace.
Since it's a block, we can scope our variables with "my" to avoid them trading on the toes of anything else.
There are two regexs to setup $s, $x, $t and $y. The sort function provides two values to compare (too see if they need swapping) $a and $b). They are the focus of the regex processing.
Each reges is identical, one works on $a and the other, $b.
The "$a =~" binds the reges to "$a" (in the first case and, as you can see the same applies to "$b" in the second).
The regex is overcomplicated by the fact that this is for windoze and windoze uses a backslash as the path separator. All backslashes in regexs need to be escaped with another backslash and that's why there's always two at a time.
So now then, we are matching a backslash "\\" followed by one or more non backslash characters "[^\\]+" which we capture using parentheses "()" so that it will be assigend to the first scoped variable (in this case $s).
Next we match another backslash "\\" along with one or more non digit characters. When this is matched, it's all just thrown away because we don't need it. Finally we match one or more digits and capture that (to assign to $x), right up to the end (forced by "$" which matches the end of string). The "$" ensures that we match this sequence where we expect it (at the end of the string) and not earlier in a very long path.
Phew!
By this time, we have the filename number in $x and the directory name (folder name for windoze guys) in $s.
Once this is repeated for the other path ($b) we can compare file numbers and directory names.
The last step then is relatively simple. If we are in the same directory ($s eq $t) then we can simply return whether to sort based on file number only. We do this with "return $x <=> $y" which does a numeric test and flags which is the greater.
If the directories are different, we compare those with a text comparison (cmp) and again, return a flag for which is greater.

So ends the lesson for how my gibberish works! ;-)

I hope that explains it a little. It certainly killed my fingers typing all that explanation!


Trojan.
 
This type of sorting is well suited for the Schwartzian transform. In this way, the regex is only run once for each item in the array rather that once for each comparison (N vs log(N))

Code:
my @sorted = map { $_->[0] }
            sort { $a->[1] cmp $b->[1]
                or $a->[2] <=> $b->[2] }
            map { [$_, /\\([^\\]+)\\\D+(\d+)$/] }
            @files;

If file list is long, this can boost the efficiency.

jaa
 
Code:
$Trojan{$gibberish} = "Genius"

Thanks for the lesson, Trojan! A star for you!

X
 
I beginning to think that I'm getting too old for this game!
A few years ago I doubt I would have missed that, especially the short circuit operators, one of my favourites!
:-(


Trojan.
 
I think the idiosyncratic tips-n-tricks of a programming language are the first to go. Especially if you switch between languages often.

BTW, the algorithmic efficiency of the sort in my previous post should have been Nlog(N) rather than N. (Oh, if only it were log(N) .. Sorting faster than you can count!)

jaa
 
How's about this for an old timer then?

Code:
my @sorted = map  { $_->[0] }
             sort { $a->[1] cmp $b->[1] }
             map  { [ $_, sprintf("%s%04d", /^(.*\D)(\d+)$/) ] } @files;

;-)


Trojan.
 
Maybe we should setup a school of excellence where we needlessly tune everything to the nth degree!

;-)


Trojam.
 
I'm not sure if this will be any faster.
The regex is a little simpler and there is only one comparison instead of two but the sprintf could kill it.


Trojan.
 

> Maybe we should setup a school of excellence where we needlessly tune everything to the nth degree!

What I find interesting is that rarely do people posting here complain that their code runs too slow or seems to take forever, only that it doesn't work. when I was weaned on Perl, almost every counterpost of code to comp.lang.perl.misc was benchmarked ( use Benchmark; ) even if it didn't need to be.

I think computers are fast enough currently that algorithmic efficiency isn't a concern anymore. Ah, well.

jaa
 
Justice41,
I agree with you. I think it's something that should be considered at all times. It might not be relevant to any one instance but if you dont get into the habit then one day it'll bite you and hurt!

Trojan.
 
Fantastic.
I gotta clip and keep this thread in my "Tutorials" file.
:)
-BG
 
footnote; since about DOS 2, you can use forward-slashes instead of back-slashes for paths. Judicious use can keep regexes more manageable and code more portable.

f

&quot;As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.&quot;
--Maurice Wilkes
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top