Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I sort columns/fields of mixed data?

Sorting

How do I sort columns/fields of mixed data?

by  KevinADC  Posted    (Edited  )
----------------------------
[small]ignore this section:
code[/small]
----------------------------

Problem:

You have some data in a file (or array or hash or list) that needs sorting by several different fields and the fields are different types of data. Using a simple alpha (cmp) or numeric (<=>) sort will not work. Some fields are alpha and some numeric (they could even be (aplha-numeric) and I need to sort the data using all the fields (or maybe just one or more fields) to get the correct order.

Solution:

Use map-sort-map, known commonly as a Schwartzian Transform.

Code:
[ol]
[li][link http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/link] [blue]@sorted[/blue] = [link http://perldoc.perl.org/functions/map.html][black][b]map[/b][/black][/link] [red]{[/red][blue]$_[/blue]->[red][[/red][fuchsia]0[/fuchsia][red]][/red][red]}[/red][/li]
[li]             [link http://perldoc.perl.org/functions/sort.html][black][b]sort[/b][/black][/link] [red]{[/red] [blue]$a[/blue]->[red][[/red][fuchsia]1[/fuchsia][red]][/red] cmp [blue]$b[/blue]->[red][[/red][fuchsia]1[/fuchsia][red]][/red] ||[/li]
[li]                    [blue]$a[/blue]->[red][[/red][fuchsia]2[/fuchsia][red]][/red] <=> [blue]$b[/blue]->[red][[/red][fuchsia]2[/fuchsia][red]][/red] ||[/li]
[li]                    [blue]$a[/blue]->[red][[/red][fuchsia]3[/fuchsia][red]][/red] <=> [blue]$b[/blue]->[red][[/red][fuchsia]3[/fuchsia][red]][/red] [red]}[/red][/li]
[li]             [black][b]map[/b][/black] [red]{[/red][link http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/link][red];[/red][red][[/red][blue]$_[/blue],[link http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/link][red]([/red][red]/[/red][purple],[/purple][red]/[/red][red])[/red][red]][/red][red]}[/red] <DATA>[red];[/red][/li]
[li][link http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/link] [red]"[/red][purple][blue]$_[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red] [olive][b]for[/b][/olive] [blue]@sorted[/blue][red];[/red][/li]
[li][teal]__DATA__[/teal][/li]
[li][teal]John,34,100.00[/teal][/li]
[li][teal]John,33,100.00[/teal][/li]
[li][teal]Jane,33,99.99[/teal][/li]
[li][teal]Fred,22,44.2[/teal][/li]
[li][teal]Johnny,33,99[/teal][/li]
[li][teal]Andrew,19,38.29[/teal][/li]
[li][teal]Chris,28,200[/teal][/li]
[li][teal]William,60,2000[/teal][/li]
[li][teal]Tom,42,1000[/teal][/li]
[li][teal]Andy,20,38.29[/teal][/li]
[/ol]

Discussion:

Line 5 of the code snippet is where the sorting begins:

Code:
5.             [black][b]map[/b][/black] [red]{[/red][black][b]chomp[/b][/black][red];[/red][red][[/red][blue]$_[/blue][black],[/black][black][b]split[/b][/black][red]([/red][red]/[/red][purple],[/purple][red]/[/red][red])[/red][red]][/red][red]}[/red] [black]<[/black][black]DATA[/black][black]>[/black][red];[/red]

The map{} block reads in the lines from <DATA> (<DATA> could be an array or hash or any list you want to sort). First the record seperator is removed using chomp; then each line is stored as an annonymous array [$_,split(/,/)] consisting of four elements. Using the first line of the data as an example the annonymous array would look like this:

Code:
['John,34,100.00','John',34,100.00]

The first element is just a copy of the line left intact, which will be used later to store in the sorted array of data. Fields, or indices now, 1, 2, 3, will be used to properly sort each column of the data in the sort{} block. Thats what lines 2, 3, and 4 are doing:

Code:
2.             [black][b]sort[/b][/black] [red]{[/red] [blue]$a[/blue][black]->[/black][red][[/red][fuchsia]1[/fuchsia][red]][/red] [black]cmp[/black] [blue]$b[/blue][black]->[/black][red][[/red][fuchsia]1[/fuchsia][red]][/red] [black]||[/black]
3.                    [blue]$a[/blue][black]->[/black][red][[/red][fuchsia]2[/fuchsia][red]][/red] [black]<=>[/black] [blue]$b[/blue][black]->[/black][red][[/red][fuchsia]2[/fuchsia][red]][/red] [black]||[/black]
4.                    [blue]$a[/blue][black]->[/black][red][[/red][fuchsia]3[/fuchsia][red]][/red] [black]<=>[/black] [blue]$b[/blue][black]->[/black][red][[/red][fuchsia]3[/fuchsia][red]][/red] [red]}[/red]

Because the data being sorted is in annonymous arrays we have to use the arrow notation '->' to access each element of the array and it's index number: [1], instead of just $a and $b like you normally see. The first sort is alpha (the names) and the others are numeric. Using the short circuit || operator allows the sort to cycle through each field if necessary to compute the final sort. It essentially reads:

if [1] are equal, compare [2], if [2] are equal, compare [3]

After all the sorting is done, line 1 (one)of the snippet:

Code:
1. [black][b]my[/b][/black] [blue]@sorted[/blue] [black]=[/black] [black][b]map[/b][/black][red]{[/red][blue]$_[/blue][black]->[/black][red][[/red][fuchsia]0[/fuchsia][red]][/red][red]}[/red]

uses the other map{} block to return the first element of each annonymous array of the sorted data to the final array @sorted. Remember, the first element of each annonymous array was a copy of each line of the unsorted data. You could return any one field of the data, or as many as you want, to the @sorted array in the map{} block. The sorted data is:

Code:
Andrew,19,38.29
Andy,20,38.29
Chris,28,200
Fred,22,44.2
Jane,33,99.99
John,33,100.00
John,34,100.00
Johnny,33,99
Tom,42,1000
William,60,2000

If you are confused by some of the code and explanations, don't worry. It took me a while to fully grasp all that this short bit of code is doing. Play around with it to get the feel of sorting complex data.

Use the link below to contact me about corrections or additions you might have concerning this FAQ. Post questions about this FAQ in the perl forum. Rate this FAQ if you find it helpful or not.

Kevin
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top