Complex Sort 1

FeiLung · Mar 9, 2007

What I would like to do is iterate through the file sorting on the ZIP field, evenly distributing the records through out the file.

Example if my file contains the following quantities of zip codes.

QTY ZIP
3 98765
2 12345
2 78934

Then my file would be sorted somewhat like the following.

98765
12345
78934
98765
12345
78934
98765

Ultimately trying to produce an even distribution of zip codes through out the file.

I have the basic code written to open a file handle and read the data in to an array, but after that, I am relying on the good natured folks of this forum for some direction.

- FL

MikeLacey · Mar 9, 2007

Is it important that you evenly distribute the records or would simply randomising the list be enough for you?

If you can just randomise the list you can use the module List which has the shuffle utility.

Mike

Hardware is that part of a computer which, when you remove electrical power, doesn't go away.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

FeiLung · Mar 9, 2007

It is important to evenly distribute the recors. I would love to have the ability to stack the rows specifically by zip code. Plus its a very interesting problem that I would love to work out in Perl.

- FL

MikeLacey · Mar 9, 2007

Ok...

Let's just talk through a possible approach then.

[li]count the items in the list[/li]
[li]sort the zip code list by number of occurence, highest at top of list[/li]
[li]divide the number of items in list by number of occurences of the most common zip code, this will give us the number of list items between occurences of that code, the gap[/li]
[li]create an array to hold your distributed list by inserting the commonest zip code into it with the correct distribution[/li]
Something like this (making it up as I go along here)
$i=0;
$max=number of occurences of the most common zip code
$zip=most common zip code
while $i <= $max {
$dist_list[$i]=$zip;
$i+=$gap;
}
[li]you should now have a list that looks like this[/li]

98765 <- 0, used slot
. <- 1, empty slot
. <- 2, empty slot
98765 <- 3, used slot
. <- 4, empty slot
. <- 5, empty slot
98765 <- used slot

[li]from here you go through a similar process for each zip code in the list[/li]
[li]problems will come when simple division gives you an array index that is already taken by another zip code...[/li]
[li]Hmmmm.... that's as far as I've got for the moment[/li]

Mike

Hardware is that part of a computer which, when you remove electrical power, doesn't go away.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

KevinADC · Mar 9, 2007

using an array of arrays should work well for this type of task:

Code:

[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%count[/blue] = [red]([/red][red])[/red][red];[/red]
[black][b]my[/b][/black] [blue]@AoA[/blue] = [red]([/red][red])[/red][red];[/red]
[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
   [url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
   [blue]$count[/blue][red]{[/red][blue]$_[/blue][red]}[/red]++[red];[/red]
   [url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@[/blue][red]{[/red][blue]$AoA[/blue][red][[/red][blue]$count[/blue][red]{[/red][blue]$_[/blue][red]}[/red]-[fuchsia]1[/fuchsia][red]][/red][red]}[/red],[blue]$_[/blue][red];[/red]

[red]}[/red]
[olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$array[/blue] [red]([/red][blue]@AoA[/blue][red])[/red] [red]{[/red]
   [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [url=http://perldoc.perl.org/functions/join.html][black][b]join[/b][/black][/url][red]([/red][red]"[/red][purple][purple][b]\n[/b][/purple][/purple][red]"[/red],[blue]@[/blue][red]{[/red][blue]$array[/blue][red]}[/red][red])[/red],[red]"[/red][purple][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]	
[teal]__DATA__[/teal]
[teal]98765[/teal]
[teal]98765[/teal]
[teal]98765[/teal]
[teal]12345[/teal]
[teal]12345[/teal]
[teal]78934[/teal]
[teal]78934[/teal]

[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]

output:

98765
12345
78934
98765
12345
78934
98765

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

MikeLacey · Mar 10, 2007

Nice...

And I have no idea how that works - would you talk that through Kevin?

Mike

Hardware is that part of a computer which, when you remove electrical power, doesn't go away.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

KevinADC · Mar 10, 2007

Hi Mike,

nice to see you posting in the perl forum [smile]

Basically, the code uses the frequency count of each zip code as it's array index. The first time a zip code is counted in the hash:

$count{$_}++

it's value is 1 (one). We need to subtract 1 to get the array started at index 0 (zero). Same is true for all subsequent counts.

We then use that value as the array index to push it into the corresponding array in the array of arrays (or multi-dimensional array):

push @{$AoA[$count{$_}-1]},$_;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

MikeLacey · Mar 12, 2007

ok - thx Kevin

Mike

Hardware is that part of a computer which, when you remove electrical power, doesn't go away.

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Complex Sort 1

FeiLung

Technical User

MikeLacey

MIS

FeiLung

Technical User

MikeLacey

MIS

KevinADC

Technical User

MikeLacey

MIS

KevinADC

Technical User

MikeLacey

MIS

Similar threads

Part and Inventory Search

Sponsor