Proportion extend sort

Proportion extend sort
Class	Sorting algorithm
Data structure	Array
Worst-case performance	O(n log n)
Best-case performance	O(n log n)
Average performance	O(n log n)
Worst-case space complexity	O(log n) auxiliary
Optimal	Yes

Proportion extend sort (abbreviated as PESort) is an in-place, comparison-based sorting algorithm which attempts to improve on the performance, particularly the worst-case performance, of quicksort.

The basic partitioning operation in quicksort has a linear access pattern which is extremely efficient on modern memory hierarchies, but the performance of the algorithm is critically dependent on the choice of a pivot value. A good pivot will divide the data to be sorted into nearly equal halves. A poor choice will result in a grossly lopsided division, leaving one part almost as large as the original problem and causing $O (n 2)$ performance.

Proportion extend sort begins with a sorted prefix of $k$ elements, then uses the median of that sample to partition the following $pk$ elements. By bounding the size ratio $p$ between the sample and the data being partitioned (i.e. the proportion by which the sorted prefix is extended), the imbalance is limited. In this, it has some similarities to samplesort.^[1]

History

Proportion extend sort was published by Jing-Chao Chen in 2001^[2] as an improvement on his earlier proportion split sort design.^[3] Its average-case performance, which was only experimentally measured in the original paper, was analyzed by Richard Cole and David C. Kandathil in 2004^[4] and by Chen in 2006,^[5] and shown to require $log 2 n + O (n)$ comparisons on average. A slightly refined variant, symmetry partition sort, was published in 2007.^[6]

Algorithm

The algorithm begins with an array divided into a sorted part $S$ adjacent to an unsorted part $U$ . (The original proportion extend sort always had the sorted part precede the unsorted part; the symmetric variant allows either order.) It is possible to begin with the first element as the sorted part (a single element is always sorted), or to sort a small number of elements using a simpler insertion sort. The initially sorted elements may also be taken from across the array to improve performance in the case of pre-sorted data.

Next, and most critically, the length of the unsorted part $| U |$ is bounded to a multiple $p$ of the length of the sorted part $| S |$ . Specifically, if $| U | > p 2 | S |$ , then recursively sort $S$ and the adjacent $p | S |$ elements of $U$ , make the result ( $p +1$ times longer than the original^[a]) the new $S$ , and repeat until the condition is satisfied.

If there is no limit on the unsorted part ( $p =\infty$ ), then the algorithm is equivalent to quicksort. If the unsorted part is of length 1 ( $p =0$ , almost), then the algorithm is equivalent to binary insertion sort. Values around $p \approx16$ give the best average-case performance, competitive with quicksort,^[6]^: 764 while smaller values improve worst-case performance.^[b]

Eliezer Albacea published a similar algorithm in 1995 called Leapfrogging samplesort where the size is limited so $| U | \leq | S |+1$ ,^[7]^[1] later generalized to $(2 k -1)(| S |+1)$ .^[8]

The sorted part of the array is divided in half (at the median), and one half is moved (by exchanging it with unsorted elements) to the far end of the array, so we have an initial partially partitioned array of the form $LUR$ , where $L$ is the left half of the sorted part, $U$ is the bounded-length unsorted part, and $R$ is the right half of the sorted part.

Then the standard quicksort partitioning step is performed on $U$ , dividing it (in place) into $U L$ and $U R$ . $U L$ and $U R$ are not sorted, but every element of $U L$ is less than or equal to the median, and every element of $U R$ is greater or equal.^[c] The final result $LU L U R R$ consists of two arrays of the necessary form (a sorted part adjacent to an unsorted part) and are sorted recursively.

Leapfrogging samplesort and the original proportion extend sort have the sorted part always precede the unsorted part, achieved by partitioning $U$ before moving $R$ , resulting in $LRU L U R$ , and then exchanging $R$ with the end of $U L$ , resulting in $LU L RU R$ . While the symmetric version is a bit trickier, it has the advantage that the $L$ and $R$ parts act as sentinel values for the partitioning loops, eliminating the need to test in the loop if the bounds of $U$ have been reached.^[1]

Most of the implementation refinements used for quicksort can be applied, including techniques for detecting and efficiently handling mostly sorted inputs.^[9]^[6] In particular, sub-sorts below a certain size threshold are usually implemented using a simple insertion sort.

As with quicksort, the number of recursive levels can be limited to $log 2 n$ if the smaller sub-sort is done first and the larger is implemented as a tail call. Unlike quicksort, the number of levels is bounded by $O (log n)$ even if this is not done.^[9]^: 781

Notes

^ Note that different sources use different conventions for $p$ : Chen uses the ratio of the unsorted part to the sorted part, so any $p >0$ makes sense, while others have it as the ratio of the total size to that of the sorted part, so only $p >1$ makes sense.
^ The algorithm requires at most $1/log 2 (1 + 1/(2 p 2 +2 p -1)) n log 2 n$ comparisons. For $p \leq 16$ , that constant may be approximated by $1.37 p 2 +1.63 p -0.5$ .
^ This assumes an ascending sort. A descending sort requires the obvious adjustments.

References

^ ^a ^b Albacea, Eliezer A. (January 2012). "Average-case analysis of Leapfrogging samplesort" (PDF). Philippine Science Letters. 5 (1).
^ Chen, Jing-Chao (July 2001). "Proportion extend sort". SIAM Journal on Computing. 31 (1): 323–330. doi:10.1137/S0097539798342903.
^ Chen, Jing-Chao (Fall 1996). "Proportion Extend Sort". SIAM Journal on Computing. 3 (3): 271–279. doi:10.1137/S0097539798342903.
^ Cole, Richard; Kandathil, David C. (14–17 September 2004). The Average Case Analysis of Partition Sorts (PDF). Algorithms—ESA 2004: 12th Annual European Symposium. Bergen. pp. 240–251. doi:10.1007/978-3-540-30140-0_23. ISBN 3-540-23025-4.
^ Chen, Jing-Chao (15 December 2006). "Efficient sample sort and the average case analysis of PEsort". Theoretical Computer Science. 369 (1–3): 44–66. doi:10.1016/j.tcs.2006.07.017.
^ ^a ^b ^c Chen, Jing-Chao (11 September 2007). "Symmetry Partition Sort". Software: Practice and Experience. 38 (7): 761–773. arXiv:0706.0046. doi:10.1002/spe.851. S2CID 844616.
^ Albacea, Eliezer A. (1995). Leapfrogging samplesort. Asian Computing Science Conference. doi:10.1007/3-540-60688-2_30.
^ Albacea, Eliezer A. (29 January 2018). "Generalized Leapfrogging Samplesort: A Class of O(n log²2 n) Worst-Case Complexity and O(n log n) Average-Case Complexity Sorting Algorithms". arXiv:1801.09431 [cs.DS].
^ ^a ^b Chen, Jing-Chao (10 July 2004). "Building a new sort function for a C library". Software: Practice and Experience. 34 (8): 777–795. doi:10.1002/spe.593. S2CID 8193225.

External links

https://github.com/jingchaochen/Symmetry-Partition-Sort Example code

[7] Note that different sources use different conventions for $p$ : Chen uses the ratio of the unsorted part to the sorted part, so any $p >0$ makes sense, while others have it as the ratio of the total size to that of the sorted part, so only $p >1$ makes sense.

[8] The algorithm requires at most $1/log 2 (1 + 1/(2 p 2 +2 p -1)) n log 2 n$ comparisons. For $p \leq 16$ , that constant may be approximated by $1.37 p 2 +1.63 p -0.5$ .

[11] This assumes an ascending sort. A descending sort requires the obvious adjustments.

[Albacea12-1] Albacea, Eliezer A. (January 2012). "Average-case analysis of Leapfrogging samplesort" (PDF). Philippine Science Letters. 5 (1).

[Chen01-2] Chen, Jing-Chao (July 2001). "Proportion extend sort". SIAM Journal on Computing. 31 (1): 323–330. doi:10.1137/S0097539798342903.

[Chen96-3] Chen, Jing-Chao (Fall 1996). "Proportion Extend Sort". SIAM Journal on Computing. 3 (3): 271–279. doi:10.1137/S0097539798342903.

[Cole04-4] Cole, Richard; Kandathil, David C. (14–17 September 2004). The Average Case Analysis of Partition Sorts (PDF). Algorithms—ESA 2004: 12th Annual European Symposium. Bergen. pp. 240–251. doi:10.1007/978-3-540-30140-0_23. ISBN 3-540-23025-4.

[Chen06-5] Chen, Jing-Chao (15 December 2006). "Efficient sample sort and the average case analysis of PEsort". Theoretical Computer Science. 369 (1–3): 44–66. doi:10.1016/j.tcs.2006.07.017.

[Chen07-6] Chen, Jing-Chao (11 September 2007). "Symmetry Partition Sort". Software: Practice and Experience. 38 (7): 761–773. arXiv:0706.0046. doi:10.1002/spe.851. S2CID 844616.

[Albacea95-9] Albacea, Eliezer A. (1995). Leapfrogging samplesort. Asian Computing Science Conference. doi:10.1007/3-540-60688-2_30.

[Albacea18-10] Albacea, Eliezer A. (29 January 2018). "Generalized Leapfrogging Samplesort: A Class of O(n log²2 n) Worst-Case Complexity and O(n log n) Average-Case Complexity Sorting Algorithms". arXiv:1801.09431 [cs.DS].

[Chen04-12] Chen, Jing-Chao (10 July 2004). "Building a new sort function for a C library". Software: Practice and Experience. 34 (8): 777–795. doi:10.1002/spe.593. S2CID 8193225.

[1]

[2]

[3]

[4]

[5]

[6]

[a]

[b]

[7]

[8]

[c]

[9]

v t e Sorting algorithms
Theory	Computational complexity theory Big O notation Total order Lists Inplacement Stability Comparison sort Adaptive sort Sorting network Integer sorting X + Y sorting Transdichotomous model Quantum sort
Exchange sorts	Bubble sort Cocktail shaker sort Odd–even sort Comb sort Gnome sort Proportion extend sort Quicksort
Selection sorts	Selection sort Heapsort Smoothsort Cartesian tree sort Tournament sort Cycle sort Weak-heap sort
Insertion sorts	Insertion sort Shellsort Splaysort Tree sort Library sort Patience sorting
Merge sorts	Merge sort Cascade merge sort Oscillating merge sort Polyphase merge sort
Distribution sorts	American flag sort Bead sort Bucket sort Burstsort Counting sort Interpolation sort Pigeonhole sort Proxmap sort Radix sort Flashsort
Concurrent sorts	Bitonic sorter Batcher odd–even mergesort Pairwise sorting network Samplesort
Hybrid sorts	Block merge sort Kirkpatrick–Reisch sort Timsort Introsort Spreadsort Merge-insertion sort
Other	Topological sorting Pre-topological order Pancake sorting Spaghetti sort
Impractical sorts	Stooge sort Slowsort Bogosort