Skip to content

[stdlib] Modify sort to pivot on median of 3 #6382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 6, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 99 additions & 24 deletions stdlib/public/core/Sort.swift.gyb
Original file line number Diff line number Diff line change
Expand Up @@ -72,58 +72,131 @@ func _insertionSort<C>(
}
}

/// Sorts the elements at `elements[a]`, `elements[b]`, and `elements[c]`.
/// Stable.
///
/// The indices passed as `a`, `b`, and `c` do not need to be consecutive, but
/// must be in strict increasing order.
///
/// - Precondition: `a < b && b < c`
/// - Postcondition: `elements[a] <= elements[b] && elements[b] <= elements[c]`
public // @testable
func _sort3<C>(
_ elements: inout C,
_ a: C.Index, _ b: C.Index, _ c: C.Index
${", by areInIncreasingOrder: (C.Iterator.Element, C.Iterator.Element) -> Bool" if p else ""}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this declared in an extension to MutableCollection & RandomAccessCollection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we finally route everything through one sort implementation that takes a closure, rather than gyb'ing, without regressing performance? A weak optimizer that may have since been fixed is the only reason we're gyb'ing this code. @bob-wilson what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the rest of the functions in the file, here. It would be great to have all of these as extension methods and to get rid of the gyb—I can try those changes out after we get this merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, getting rid of gyb and using extension methods are totally orthogonal concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the history of this or why you had to gyb the code, but regardless, if the optimizer has improved to let you get rid of that without regressing performance, that would be great.

where
C : MutableCollection & RandomAccessCollection
${"" if p else ", C.Iterator.Element : Comparable"}
{
// There are thirteen possible permutations for the original ordering of
// the elements at indices `a`, `b`, and `c`. The comments in the code below
// show the relative ordering of the three elements using a three-digit
// number as shorthand for the position and comparative relationship of
// each element. For example, "312" indicates that the element at `a` is the
// largest of the three, the element at `b` is the smallest, and the element
// at `c` is the median. This hypothetical input array has a 312 ordering for
// `a`, `b`, and `c`:
//
// [ 7, 4, 3, 9, 2, 0, 3, 7, 6, 5 ]
// ^ ^ ^
// a b c
//
// - If each of the three elements is distinct, they could be ordered as any
// of the permutations of 1, 2, and 3: 123, 132, 213, 231, 312, or 321.
// - If two elements are equivalent and one is distinct, they could be
// ordered as any permutation of 1, 1, and 2 or 1, 2, and 2: 112, 121, 211,
// 122, 212, or 221.
// - If all three elements are equivalent, they are already in order: 111.

switch (${cmp("elements[b]", "elements[a]", p)},
${cmp("elements[c]", "elements[b]", p)}) {
case (false, false):
// 0 swaps: 123, 112, 122, 111
break

case (true, true):
// 1 swap: 321
// swap(a, c): 312->123
swap(&elements[a], &elements[c])

case (true, false):
// 1 swap: 213, 212 --- 2 swaps: 312, 211
// swap(a, b): 213->123, 212->122, 312->132, 211->121
swap(&elements[a], &elements[b])

if ${cmp("elements[c]", "elements[b]", p)} {
// 132 (started as 312), 121 (started as 211)
// swap(b, c): 132->123, 121->112
swap(&elements[b], &elements[c])
}

case (false, true):
// 1 swap: 132, 121 --- 2 swaps: 231, 221
// swap(b, c): 132->123, 121->112, 231->213, 221->212
swap(&elements[b], &elements[c])

if ${cmp("elements[b]", "elements[a]", p)} {
// 213 (started as 231), 212 (started as 221)
// swap(a, b): 213->123, 212->122
swap(&elements[a], &elements[b])
}
}
}

/// Reorders `elements` and returns an index `p` such that every element in
/// `elements[range.lowerBound..<p]` is less than every element in
/// `elements[p..<range.upperBound]`.
///
/// - Precondition: The count of `range` must be >= 3:
/// `elements.distance(from: range.lowerBound, to: range.upperBound) >= 3`
func _partition<C>(
_ elements: inout C,
subRange range: Range<C.Index>
${", by areInIncreasingOrder: (C.Iterator.Element, C.Iterator.Element) -> Bool" if p else ""}
) -> C.Index
where
C : MutableCollection & RandomAccessCollection
${"" if p else ", C.Iterator.Element : Comparable"} {

${"" if p else ", C.Iterator.Element : Comparable"}
{
var lo = range.lowerBound
var hi = range.upperBound

if lo == hi {
return lo
}
var hi = elements.index(before: range.upperBound)

// The first element is the pivot.
let pivot = elements[range.lowerBound]
// Sort the first, middle, and last elements, then use the middle value
// as the pivot for the partition.
let half = numericCast(elements.distance(from: lo, to: hi)) as UInt / 2
let mid = elements.index(lo, offsetBy: numericCast(half))
_sort3(&elements, lo, mid, hi
${", by: areInIncreasingOrder" if p else ""})
let pivot = elements[mid]

// Loop invariants:
// * lo < hi
// * elements[i] < pivot, for i in range.lowerBound+1..lo
// * pivot <= elements[i] for i in hi..range.upperBound

Loop: while true {
FindLo: repeat {
// * elements[i] < pivot, for i in range.lowerBound..<lo
// * pivot <= elements[i] for i in hi..<range.upperBound
Loop: while true {
FindLo: do {
elements.formIndex(after: &lo)
while lo != hi {
if !${cmp("elements[lo]", "pivot", p)} { break FindLo }
elements.formIndex(after: &lo)
}
break Loop
} while false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good transformation, but do we even need the do { ... } scope? It looks like we could dump it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need labeled scopes that break FindLo and break FindHi can break out of. The refactored versions I can come up with end up duplicating the lo != hi bounds checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I overlooked the inner breaks. No problem here, then.


FindHi: repeat {
FindHi: do {
elements.formIndex(before: &hi)
while hi != lo {
if ${cmp("elements[hi]", "pivot", p)} { break FindHi }
elements.formIndex(before: &hi)
}
break Loop
} while false
}

swap(&elements[lo], &elements[hi])
}

elements.formIndex(before: &lo)
if lo != range.lowerBound {
// swap the pivot into place
swap(&elements[lo], &elements[range.lowerBound])
}

return lo
}

Expand Down Expand Up @@ -190,7 +263,7 @@ func _introSortImpl<C>(
depthLimit: depthLimit &- 1)
_introSortImpl(
&elements,
subRange: (elements.index(after: partIdx))..<range.upperBound,
subRange: partIdx..<range.upperBound,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an unintentional change? If not, why not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an intentional change. Because the previous implementation of _partition slotted the pivot element into partIdx as its final step, the elements were always arranged such that everything in range.lowerBound..<partIndex was less than elements[partIdx] and in elements.index(after: partIdx)..<range.upperBound was greater than or equal to elements[partIdx].

The new version doesn't have that guarantee, only that everything in range.lowerBound..<partIdx is less than everything in partIdx..<range.upperBound, so the sort of the upper partition needs to include the partitioning index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM; thanks for explaining.

${"by: areInIncreasingOrder, " if p else ""}
depthLimit: depthLimit &- 1)
}
Expand Down Expand Up @@ -234,6 +307,7 @@ func _siftDown<C>(
${", by: areInIncreasingOrder" if p else ""})
}
}

func _heapify<C>(
_ elements: inout C,
subRange range: Range<C.Index>
Expand Down Expand Up @@ -262,6 +336,7 @@ func _heapify<C>(
${", by: areInIncreasingOrder" if p else ""})
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that these are particularly bad, but we try to avoid whitespace-only changes, for the record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, thanks!

func _heapSort<C>(
_ elements: inout C,
subRange range: Range<C.Index>
Expand Down
31 changes: 31 additions & 0 deletions validation-test/stdlib/Algorithm.swift
Original file line number Diff line number Diff line change
Expand Up @@ -224,5 +224,36 @@ Algorithm.test("sorted/return type") {
let x: Array = ([5, 4, 3, 2, 1] as ArraySlice).sorted()
}

Algorithm.test("sort3/simple")
.forEach(in: [
[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]
]) {
var input = $0
_sort3(&input, 0, 1, 2)
expectEqual([1, 2, 3], input)
}

func isSorted<T>(_ a: [T], by areInIncreasingOrder: (T, T) -> Bool) -> Bool {
return !a.dropFirst().enumerated().contains(where: { (offset, element) in
areInIncreasingOrder(element, a[offset])
})
}

Algorithm.test("sort3/stable")
.forEach(in: [
[1, 1, 2], [1, 2, 1], [2, 1, 1], [1, 2, 2], [2, 1, 2], [2, 2, 1], [1, 1, 1]
]) {
// decorate with offset, but sort by value
var input = Array($0.enumerated())
_sort3(&input, 0, 1, 2) { $0.element < $1.element }
// offsets should still be ordered for equal values
expectTrue(isSorted(input) {
if $0.element == $1.element {
return $0.offset < $1.offset
}
return $0.element < $1.element
})
}

runAllTests()