Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

What is the Quick Sort, and how fast is it, really?

What is the Quick Sort, and how fast is it, really?

by  logiclrd  Posted    (Edited  )
[tt]The Quicksort
-------------[/tt]

The Quicksort is an algorithm for sorting arrays of values into ascending (or descending) order. It is one of the more efficient of many algorithms which can accomplish this task.

I am going to propose a routine -- not necessarily the quicksort, but certainly a routine that does the same thing -- that takes an array, a starting point and an ending point, and when it's finished, the elements between the start & end point, inclusive, are in order. Let's call this routine:

[tt][color #000080]SUB[/color] sort(array(), start%, finish%)[/tt]

We won't worry about how it's written just yet. Instead, we'll worry about the implementation of another routine, one that takes exactly the same parameters, but all that it does is take the first number and move everything else in the array that is less than the first number before it, leaving everything larger than it after it. Let's call this one:

[tt][color #000080]SUB[/color] partition(array(), start%, finish%)[/tt]

An example of its operation would be as follows:

[tt]Given the input array: 3, 2, 5, 1, 4
and the range: |<--------->|
The result would be: 2, 1, 3, 5, 4[/tt]

Note that all of the numbers that are less than our first number, 3, are now before it in the list. The numbers greater than it are still after it. The order within these two ranges HAS NOT been changed, and even if it were, in general, it would not be sorted. The only condition that the SUB partition must impose is that the numbers less than the first number be moved in this way.

Now, suppose that a number is in position in a sorted array. Obviously, all of the numbers less than it are therefore before it, while all of the numbers greater than it are after it. If this were not true, then the array would not be sorted. It can be seen that arrays resulting from what the SUB partition does, by also having this property, force that first number, known as the pivot, to be in its final position. Even if the numbers to the left and right of it are not sorted, the fact that they are permutations, respectively, of the numbers to the left and the numbers to the right of the pivot in the fully sorted array means that the pivot need no longer move for the array to become sorted.

Here is sample code for the SUB partition:

[tt][color #000080]SUB[/color] partition(array(), start%, finish%)
pivot% = start%
pivotvalue = array(pivot%)
[color #000080]FOR[/color] i% = start% + 1 [color #000080]TO[/color] finish%
[color #000080]IF[/color] array(i%) < pivotvalue [color #000080]THEN[/color]
array(pivot%) = array(i%)
array(i%) = array(pivot% + 1)
array(pivot% + 1) = pivotvalue
pivot% = pivot% + 1
[color #000080]END IF
NEXT[/color] i%
[color #000080]END SUB[/color][/tt]

There are more efficient ways to partition arrays, but this way is one of the easier to understand. The code inside the FOR loop finds a number that is less than the pivot but after it in the array. Assuming that this number is not directly after the pivot, it can be seen that the number directly after the pivot must be greater than it, since if it were smaller, it would have already been swapped. Upon finding such a number, which is evidently violating the conditions that the SUB partition must impose, it first swaps the offending value with the value directly after the pivot, so that the pivot is then adjacent to the number less than it. This does not affect whether the condition is imposed or not, because no numbers are moved onto the other side of the pivot. After this move, the pivot is moved one place farther into the array, swapping it with the smaller value. This essentially enforces the partitioning condition onto the array that has so far been checked -- i.e., the elements of the array between the start and the loop variable i%, inclusive. In actual fact, the two swap operations have been optimized here into a 3-way rotation, which requires only 3 assignments, since the pivot's value has already been stored.

After enforcing the partition condition onto the array, we now have two sub-arrays. Before the pivot is the unsorted subarray containing all of the values less than the pivot, and after the pivot is the unsorted subarray containing all of the values greater than the pivot. Remember now that at the very start a SUB sort was proposed whose operation we did not care about, but which sorted a subarray given start & end points. If we sort the elements to the left of the pivot and the elements to the right of the pivot, then logically, the entire array will be in order. Formally, this is because directly after enforcing the partition condition, the elements to the left of the pivot are merely a permutation of the elements to the left of the pivot in the sorted array, and similarly for the elements to the right of the pivot. The SUB sort simply undoes this permutation, and the result is a sorted array. So, if we include these calls to the SUB sort at the end of the SUB partition, then the SUB partition will effectively sort an array. However, it is important to make sure that we are always passing a valid range of elements into the SUB sort. We can also simplify slightly by noting that an array with less than 2 elements is always sorted. What this means is that if there are 2 or more elements before the pivot, then we will call 'sort array(), start%, pivot% - 1', and similarly, if there are 2 or more elements after the pivot, then we will call 'sort array(), pivot% + 1, finish%'. In code, this looks like this:
[tt]
[color #000080]IF[/color] pivot% - start% >= 2 [color #000080]THEN[/color] sort array(), start%, pivot% - 1
[color #000080]IF[/color] finish% - pivot% >= 2 [color #000080]THEN[/color] sort array(), pivot% + 1, finish%[/tt]

Now, we have a SUB which sorts a number by first moving one element into its final position and enforcing the condition that the subarrays to the left and to the right be permutations of those to the left & right in the sorted array.

Regardless of how SUB partition does this, the end result is that the array is sorted. If an array gets small enough -- to be more precise, exactly 2 elements -- then the partitioning operation will always result in a sorted array, regardless of the calls to SUB sort. Also, if the array only has 2 elements, then there can never be 2 or more elements on either side of the pivot. Thus, the SUB sort will never be called. This means that for sufficiently small arrays, SUB partition sorts the array without making use of SUB sort. Since when SUB partition calls SUB sort, it is always on arrays with less elements than the array that SUB partition is sorting, we can thus use SUB partition to do the job of SUB sort. If the number of elements in a call to SUB partition is greater than 2, then the recursive calls to it will have less elements, and eventually 2 elements. When it reaches 2 elements, it will not call itself any more, because the subarrays to the left and to the right of the pivot are already sorted (since they have less than 2 elements). Thus, the SUB partition can be rewritten to include the recursive calls to create SUB quicksort:

[tt][color #000080]SUB[/color] quicksort(array(), start%, finish%)
[color #008000]'first, partition the array[/color]
pivot% = start%
pivotvalue = array(pivot%)
[color #000080]FOR[/color] i% = start% + 1 [color #000080]TO[/color] finish%
[color #000080]IF[/color] array(i%) < pivotvalue [color #000080]THEN[/color]
array(pivot%) = array(i%)
array(i%) = array(pivot% + 1)
array(pivot% + 1) = pivotvalue
pivot% = pivot% + 1
[color #000080]END IF
NEXT[/color] i%

[color #008000]'then, sort the subarrays to each side of the pivot[/color]
[color #000080]IF[/color] pivot% - start% >= 2 [color #000080]THEN[/color] quicksort array(), start%, pivot% - 1
[color #000080]IF[/color] finish% - pivot% >= 2 [color #000080]THEN[/color] quicksort array(), pivot% + 1, finish%[color #000080]
END SUB[/color][/tt]

This may seem like an overly complicated and roundabout way to sort an array, but in fact, it is one of the fastest, and by far one of the shortest (codewise) ways to sort an array.

Algorithms have orders, denoted by O(expression), where expression is a function of 'n'. The order of an algorithm describes how the run time -- that is, the time it takes to complete the algorithm -- grows as the number of elements grows. An algorithm of O(n), for instance, takes twice as long if there are twice as many elements. An algorithm of O(n^2) takes four times as long if there are twice as many elements. It is possible to calculate what the order of an algorithm will be, and since it is only used for comparisons in this way, constant multipliers drop out and the result is a simplified expression. The order expression is also usually reduced to its most significant -- that is, influential -- term. The concept of 'big-Oh' of an algorithm can be used to compare algorithms in terms of run time. The order of the bubblesort, a fairly common sorting routine primarily due to its simplicity (it takes 5 lines of code to program in QBASIC), is O(n^2). The order of the quicksort, due to the way it cuts the array into smaller pieces roughly one half of the size of the original size, comes out to be O(n*lg(n)), where lg(x) is the logarithm, base 2, of x.

To demonstrate how fast the quick sort really is, let us say that on a given system, it takes one microsecond to sort an array with 10 elements no matter which algorithm you use. The following table describes how the run time grows based on the number of elements:
[tt]
elements bubblesort quicksort analysis
-----------------------------------------------------------
1 0.00000001 0.00000003 bubblesort 3 times faster
10 0.00000100 0.00000100 same speed
100 0.00010000 0.00003321 quicksort 3 times faster
1,000 0.01000000 0.00110352 quicksort 9 times faster
10,000 1.00000000 0.03665816 quicksort 27 times faster
100,000 100.000000 1.21775777 quicksort 82 times faster
1,000,000 10,000.000 40.4530376 quicksort 247 times faster[/tt]

The second and 3rd columns are in seconds. A large company wishes to sort its list of employees, 100,000 elements long, by, say, their last name. If they choose to use the bublesort, it will take nearly two minutes to sort the list. With the quicksort, on the other hand, it takes a bit over 1.2 seconds. Obviously, a programmer working for the company would do well to pick the quick sort, and now so can you. Have no hesitation to use the quick sort wherever you need to sort lists. Enjoy =}
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top