Skip to content

Commit d22ffda

Browse files
numistDougGregor
authored andcommitted
[Proposal] Ordered collection diffing (#968)
* Add collection diffing proposal for pitching to swift-evolution * BidirectionalCollection should inherit from OrderedCollection, as pointed out by @Michael_Ilseman * StringProtocol is also a bidirectional collection * Add Kyle to the authors list * Updates from the pitch and other feedback
1 parent b27d971 commit d22ffda

File tree

1 file changed

+339
-0
lines changed

1 file changed

+339
-0
lines changed
Lines changed: 339 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,339 @@
1+
# Ordered Collection Diffing
2+
3+
* Proposal: SE-NNNN
4+
* Authors: [Scott Perry](https://github.com/numist), [Kyle Macomber](https://github.com/kylemacomber)
5+
* Review Manager: TBD
6+
* Status: **Awaiting review**
7+
* Prototype: [numist/Diffing](https://github.com/numist/Diffing)
8+
9+
## Introduction
10+
11+
This proposal describes additions to the standard library that provide an interchange format for diffs as well as diffing/patching functionality for ordered collection types.
12+
13+
## Motivation
14+
15+
Representing, manufacturing, and applying transactions between states today requires writing a lot of error-prone code. This proposal is inspired by the convenience of the `diffutils` suite when interacting with text files, and the reluctance to solve similar problems in code with `libgit2`.
16+
17+
Many state management patterns would benefit from improvements in this area, including undo/redo stacks, generational stores, and syncing differential content to/from a service.
18+
19+
## Proposed solution
20+
21+
A new type representing the difference between ordered collections is introduced along with methods that support its production and application.
22+
23+
Using this API, a line-by-line three-way merge can be performed in a few lines of code:
24+
25+
``` swift
26+
// Split the contents of the sources into lines
27+
let baseLines = base.components(separatedBy: "\n")
28+
let theirLines = theirs.components(separatedBy: "\n")
29+
let myLines = mine.components(separatedBy: "\n")
30+
31+
// Create a difference from base to theirs
32+
let diff = theirLines.shortestEditScript(from:baseLines)
33+
34+
// Apply it to mine, if possible
35+
guard let patchedLines = myLines.applying(diff) else {
36+
print("Merge conflict applying patch, manual merge required")
37+
return
38+
}
39+
40+
// Reassemble the result
41+
let patched = patchedLines.joined(separator: "\n")
42+
print(patched)
43+
```
44+
45+
## Detailed design
46+
47+
### Producing diffs
48+
49+
Collections can only be efficiently diffed when they have a strong sense of order, so difference production is added to `BidirectionalCollection`:
50+
51+
``` swift
52+
@available(swift, introduced: 5.1)
53+
extension BidirectionalCollection {
54+
/// Returns the difference needed to produce the receiver's state from the
55+
/// parameter's state with the fewest possible changes, using the provided
56+
/// closure to establish equivalence between elements.
57+
///
58+
/// This function does not infer element moves, but they can be computed
59+
/// using `OrderedCollectionDifference.inferringMoves()` if desired.
60+
///
61+
/// Implementation is an optimized variation of the algorithm described by
62+
/// E. Myers (1986).
63+
///
64+
/// - Parameters:
65+
/// - other: The base state.
66+
/// - areEquivalent: A closure that returns whether the two
67+
/// parameters are equivalent.
68+
///
69+
/// - Returns: The difference needed to produce the reciever's state from
70+
/// the parameter's state.
71+
///
72+
/// - Complexity: O(*n* * *d*), where *n* is `other.count + self.count` and
73+
/// *d* is the number of changes between the two ordered collections.
74+
public func shortestEditScript<C>(
75+
from other: C, by areEquivalent: (Element, C.Element) -> Bool
76+
) -> OrderedCollectionDifference<Element>
77+
where C : BidirectionalCollection, C.Element == Self.Element
78+
}
79+
80+
extension BidirectionalCollection where Element: Equatable {
81+
/// Returns the difference needed to produce the receiver's state from the
82+
/// parameter's state with the fewest possible changes, using equality to
83+
/// establish equivalence between elements.
84+
///
85+
/// This function does not infer element moves, but they can be computed
86+
/// using `OrderedCollectionDifference.inferringMoves()` if desired.
87+
///
88+
/// Implementation is an optimized variation of the algorithm described by
89+
/// E. Myers (1986).
90+
///
91+
/// - Parameters:
92+
/// - other: The base state.
93+
///
94+
/// - Returns: The difference needed to produce the reciever's state from
95+
/// the parameter's state.
96+
///
97+
/// - Complexity: O(*n* * *d*), where *n* is `other.count + self.count` and
98+
/// *d* is the number of changes between the two ordered collections.
99+
public func shortestEditScript<C>(from other: C) -> OrderedCollectionDifference<Element>
100+
where C: BidirectionalCollection, C.Element == Self.Element
101+
```
102+
103+
The `shortestEditScript(from:)` method determines the fewest possible edits required to transition betewen the two states and stores them in a difference type, which is defined as:
104+
105+
``` swift
106+
/// A type that represents the difference between two ordered collection states.
107+
@available(swift, introduced: 5.1)
108+
public struct OrderedCollectionDifference<ChangeElement> {
109+
/// A type that represents a single change to an ordered collection.
110+
///
111+
/// The `offset` of each `insert` refers to the offset of its `element` in
112+
/// the final state after the difference is fully applied. The `offset` of
113+
/// each `remove` refers to the offset of its `element` in the original
114+
/// state. Non-`nil` values of `associatedWith` refer to the offset of the
115+
/// complementary change.
116+
public enum Change {
117+
case insert(offset: Int, element: ChangeElement, associatedWith: Int?)
118+
case remove(offset: Int, element: ChangeElement, associatedWith: Int?)
119+
}
120+
121+
/// Creates an instance from a collection of changes.
122+
///
123+
/// For clients interested in the difference between two ordered
124+
/// collections, see `OrderedCollection.shortestEditScript(from:)`.
125+
///
126+
/// To guarantee that instances are unambiguous and safe for compatible base
127+
/// states, this initializer will fail unless its parameter meets to the
128+
/// following requirements:
129+
///
130+
/// 1) All insertion offsets are unique
131+
/// 2) All removal offsets are unique
132+
/// 3) All offset associations between insertions and removals are symmetric
133+
///
134+
/// - Parameter changes: A collection of changes that represent a transition
135+
/// between two states.
136+
///
137+
/// - Complexity: O(*n* * log(*n*)), where *n* is the length of the
138+
/// parameter.
139+
public init?<C: Collection>(_ c: C) where C.Element == Change
140+
141+
/// The `.insert` changes contained by this difference, from lowest offset to highest
142+
public var insertions: [Change] { get }
143+
144+
/// The `.remove` changes contained by this difference, from lowest offset to highest
145+
public var removals: [Change] { get }
146+
}
147+
148+
/// An OrderedCollectionDifference is itself a Collection.
149+
///
150+
/// The enumeration order of `Change` elements is:
151+
///
152+
/// 1. `.remove`s, from highest `offset` to lowest
153+
/// 2. `.insert`s, from lowest `offset` to highest
154+
///
155+
/// This guarantees that applicators on compatible base states are safe when
156+
/// written in the form:
157+
///
158+
/// ```
159+
/// for c in diff {
160+
/// switch c {
161+
/// case .remove(offset: let o, element: _, associatedWith: _):
162+
/// arr.remove(at: o)
163+
/// case .insert(offset: let o, element: let e, associatedWith: _):
164+
/// arr.insert(e, at: o)
165+
/// }
166+
/// }
167+
/// ```
168+
extension OrderedCollectionDifference : Collection {
169+
public typealias Element = OrderedCollectionDifference<ChangeElement>.Change
170+
public struct Index: Comparable, Hashable {}
171+
}
172+
173+
extension OrderedCollectionDifference.Change: Equatable where ChangeElement: Equatable {}
174+
extension OrderedCollectionDifference: Equatable where ChangeElement: Equatable {}
175+
176+
extension OrderedCollectionDifference.Change: Hashable where ChangeElement: Hashable {}
177+
extension OrderedCollectionDifference: Hashable where ChangeElement: Hashable {
178+
/// Infers which `ChangeElement`s have been both inserted and removed only
179+
/// once and returns a new difference with those associations.
180+
///
181+
/// - Returns: an instance with all possible moves inferred.
182+
///
183+
/// - Complexity: O(*n*) where *n* is `self.count`
184+
public func inferringMoves() -> OrderedCollectionDifference<ChangeElement>
185+
}
186+
187+
extension OrderedCollectionDifference: Codable where ChangeElement: Codable {}
188+
```
189+
190+
A `Change` is a single mutating operation, an `OrderedCollectionDifference` is a plurality of such operations that represents a complete transition between two states. Given the interdependence of the changes, `OrderedCollectionDifference` has no mutating members, but it does allow index- and `Slice`-based access to its changes via `Collection` conformance as well as a validating initializer taking a `Collection`.
191+
192+
Fundamentally, there are only two operations that mutate ordered collections, `insert(_:at:)` and `remove(at:)`, but there are benefits from being able to represent other operations such as moves and replacements, especially for UIs that may want to animate a move differently from an `insert`/`remove` pair. These operations are represented using `associatedWith:`. When non-`nil`, they refer to the offset of the counterpart as described in the headerdoc.
193+
194+
In a similar way, the name `shortestEditScript(from:)` uses a term of art to admit the use of an algorithm that compromises between performance and a minimal output. It computes the [longest common subsequence](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem) between the two collections, but not the [longest common substring](https://en.wikipedia.org/wiki/Longest_common_substring_problem) (which is a much slower operation). In the future other algorithms may be added as different methods to satisfy the need for different performance and output characteristics.
195+
196+
### Application of instances of `OrderedCollectionDifference`
197+
198+
``` swift
199+
extension RangeReplaceableCollection {
200+
/// Applies a difference to a collection.
201+
///
202+
/// - Parameter difference: The difference to be applied.
203+
///
204+
/// - Returns: An instance representing the state of the receiver with the
205+
/// difference applied, or `nil` if the difference is incompatible with
206+
/// the receiver's state.
207+
///
208+
/// - Complexity: O(*n* + *c*), where *n* is `self.count` and *c* is the
209+
/// number of changes contained by the parameter.
210+
@available(swift, introduced: 5.1)
211+
public func applying(_ difference: OrderedCollectionDifference<Element>) -> Self?
212+
}
213+
```
214+
215+
Applying a diff to an incompatible base state is the only way application can fail. `applying(_:)` expresses this by returning nil.
216+
217+
## Source compatibility
218+
219+
This proposal is additive and the names of the types it proposes are not likely to already be in wide use, so it does not represent a significant risk to source compatibility.
220+
221+
## Effect on ABI stability
222+
223+
This proposal does not affect ABI stability.
224+
225+
## Effect on API resilience
226+
227+
This feature is additive and symbols marked with `@available(swift, introduced: 5.1)` as appropriate.
228+
229+
## Alternatives considered
230+
231+
### `shortestEditScript(from:by:)` defined in protocol instead of extension
232+
233+
Different algorithms with different premises and/or semantics are free to be defined using different function names.
234+
235+
### Communicating changes via a series of callbacks
236+
237+
Breaking up a transaction into a sequence of imperative events is not very Swifty, and the pattern has proven to be fertile ground for defects.
238+
239+
### More cases in `OrderedCollectionDifference.Change`
240+
241+
While other cases such as `.move` are tempting, the proliferation of code in switch statements is unwanted overhead for clients that don't care about the "how" of a state transition so much as the "what".
242+
243+
The use of associated offsets allows for more information to be encoded into the diff without making it more difficult to use. You've already seen how associated offsets can be used to illustrate moves (as produced by `inferringMoves()`):
244+
245+
``` swift
246+
OrderedCollectionDifference<String>([
247+
.remove(offset:0, element: "value", associatedWith: 4),
248+
.insert(offset:4, element: "value", associatedWith: 0)
249+
])
250+
```
251+
252+
But they can also be used to illustrate replacement when the offsets refer to the same position (and the element is different):
253+
254+
``` swift
255+
OrderedCollectionDifference<String>([
256+
.remove(offset:0, element: "oldvalue", associatedWith: 0),
257+
.insert(offset:0, element: "newvalue", associatedWith: 0)
258+
])
259+
```
260+
261+
Differing offsets and elements can be combined when a value is both moved and replaced (or changed):
262+
263+
``` swift
264+
OrderedCollectionDifference<String>([
265+
.remove(offset:4, element: "oldvalue", associatedWith: 0),
266+
.insert(offset:0, element: "newvalue", associatedWith: 4)
267+
])
268+
```
269+
270+
Neither of these two latter forms can be inferred from a diff by inferringMoves(), but they can be legally expressed by any API that vends a difference.
271+
272+
### `applying(_:) throws -> Self` instead of `applying(_:) -> Self?`
273+
274+
Applying a diff can only fail when the base state is incompatible. As such, the additional granularity provided by an error type does not add any value.
275+
276+
### Use `Index` instead of offset in `Change`
277+
278+
Because indexes cannot be navigated in the absence of the collection instance that generated them, a diff based on indexes instead of offsets would be much more limited in usefulness as a boundary type. If indexes are required, they can be rehydrated from the offsets in the presence of the collection(s) to which they belong.
279+
280+
### `OrderedCollection` conformance for `OrderedCollectionDifference`
281+
282+
Because the change offsets refer directly to the resting positions of elements in the base and modified states, the changes represent the same state transition regardless of their order. The purpose of ordering is to optimize for understanding, safety, and/or performance. In fact, this prototype already contains examples of two different equally valid sort orders:
283+
284+
* The order provided by `for in` is optimized for safe diff application when modifying a compatible base state one element at a time.
285+
* `applying(_:)` uses a different order where `insert` and `remove` instances are interleaved based on their adjusted offsets in the base state.
286+
287+
Both sort orders are "correct" in representing the same state transition.
288+
289+
### `Change` generic on both `BaseElement` and `OtherElement` instead of just `Element`
290+
291+
Application of differences would only be possible when both `Element` types were equal, and there would be additional cognitive overhead with comparators with the type `(Element, Other.Element) -> Bool`.
292+
293+
Since the comparator forces both types to be effectively isomorphic, a diff generic over only one type can satisfy the need by mapping one (or both) ordered collections to force their `Element` types to match.
294+
295+
### `difference(from:using:)` with an enum parameter for choosing the diff algorithm instead of `shortestEditScript(from:)`
296+
297+
This is an attractive API concept, but it can be very cumbersome to extend. This is especially the case for types like `OrderedSet` that—through member uniqueness and fast membership testing—have the capability to support very fast diff algorithms that aren't appropriate for other types.
298+
299+
### `CollectionDifference` or just `Difference` instead of `OrderedCollectionDifference`
300+
301+
The name `OrderedCollectionDifference` gives us the opportunity to build a family of related types in the future, as the difference type in this proposal is (intentionally) unsuitable for representing differences between keyed collections (which don't shift their elements' keys on insertion/removal) or structural differences between treelike collections (which are multidimensional).
302+
303+
## Intentional omissions:
304+
305+
### Further adoption
306+
307+
This API allows for more interesting functionality that is not included in this proposal.
308+
309+
For example, this propsal could have included a `reversed()` function on the difference type that would return a new difference that would undo the application of the original.
310+
311+
The lack of additional conveniences and functionality is intentional; the goal of this proposal is to lay the groundwork that such extensions would be built upon.
312+
313+
In the case of `reversed()`, clients of the API in this proposal can use `Collection.map()` to invert the case of each `Change` and feed the result into `OrderedCollectionDifference.init(_:)`:
314+
315+
``` swift
316+
let diff: OrderedCollectionDifference<Int> = /* ... */
317+
let reversed = OrderedCollectionDifference<Int>(
318+
diff.map({(change) -> OrderedCollectionDifference<Int>.Change in
319+
switch change {
320+
case .insert(offset: let o, element: let e, associatedWith: let a):
321+
return .remove(offset: o, element: e, associatedWith: a)
322+
case .remove(offset: let o, element: let e, associatedWith: let a):
323+
return .insert(offset: o, element: e, associatedWith: a)
324+
}
325+
})
326+
)!
327+
```
328+
329+
### `mutating apply(_:)`
330+
331+
There is no mutating applicator because there is no algorithmic advantage to in-place application.
332+
333+
### `mutating inferringMoves()`
334+
335+
While there may be savings to be had from in-place move inferencing; we're holding this function for a future proposal.
336+
337+
### Formalizing the concept of an ordered collection
338+
339+
This problem warrants a proposal of its own.

0 commit comments

Comments
 (0)