-
Notifications
You must be signed in to change notification settings - Fork 193
AttributedString APIs for operations over noncontiguous ranges #1145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributedString APIs for operations over noncontiguous ranges #1145
Conversation
@swift-ci please test |
@@ -351,7 +471,8 @@ extension AttributedString.Runs { | |||
internal func _slicedRunBoundary( | |||
after i: AttributedString.Index, | |||
attributeNames: [String], | |||
constraints: [AttributeRunBoundaries] | |||
constraints: [AttributeRunBoundaries], | |||
endOfCurrent: Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not quite sure what endOfCurrent
means semantically. Would it be possible to show a simple example of how it affects the result of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, so this function is attempting to find the boundary of a run following the given index (which can be either at the start of a run or anywhere within a run). endOfCurrent
determines whether the index returned is the end of the current run or the start of the next run. For contiguous runs, the end of the current is always equal to the start of the next since runs are back-to-back when contiguous. However for discontiguous runs, the end of the current may be before the start of the next.
For example, if I have the string "AAABBB" where the A's have one set of attributes and the B's have another (i.e. 2 attribute runs). For a contiguous runs view of the full string, calling _slicedRunBoundary(after: startIndex)
will always return the index of the first B regardless of what endOfCurrent
is since the first B is the end of the A run and the start of the B run.
But now, let's imagine I have a discontiguous slice that only includes these subranges: "[AA]AB[BB]" (so the discontiguous slice looks like "AABB"). In this case, _slicedRunBoundary(after: startIndex)
will return the index of the third A for endOfCurrent: true
and the index of the second B for endOfCurrent: false
because the first run ends at the third A but the second run doesn't start until the second B.
The reason we need the ability to calculate both is because we need both to support enumeration of the runs view (which enumerates runs by providing the start and end index of every run) but also to allow subscripting the runs view (which provides the range of a singular run containing an arbitrary index without looking to the next run). Previously we only had the ability to find the start of the next run and we used it for both that and finding the end of the current run, but now that those indices can differ we need a way to determine both.
Sources/FoundationEssentials/AttributedString/AttributedString+Runs+AttributeSlices.swift
Outdated
Show resolved
Hide resolved
@available(FoundationPreview 6.2, *) | ||
extension DiscontiguousAttributedSubstring : Equatable { | ||
public static func == (lhs: Self, rhs: Self) -> Bool { | ||
AttributedString.Guts._characterwiseIsEqual(lhs.runs, to: rhs.runs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent to looping over the ranges, and compare their characters one by one, like this?
for range in _indices.ranges {
AttributedString.Guts.characterwiseIsEqual(lhs.guts, in: range, to: rhs.guts, in: range)
}
Asking because in the hash
implementation below, you're doing it range by range. That looks like a different way to think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's how _characterwiseIsEqual
is implemented - it loops over each run's range and performs a character-wise equality check on the text content and an equality check on the attribute containers
} | ||
} | ||
|
||
public subscript(bounds: some RangeExpression<AttributedString.Index>) -> DiscontiguousAttributedSubstring { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new API or a protocol requirement? It doesn't look like a protocol requirement for AttributedStringAttributeMutation
. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a protocol requirement - it's just new API for this type. It's an almost-requirement of AttributedStringProtocol
to which DiscontiguousAttributedSubstring
partially conforms, but is slightly different because this returns a DiscontiguousAttributedSubstring
instead of just an AttributedSubstring
(which is required because a singular range of an already-discontiguous substring may produce another discontiguous substring)
@swift-ci please test |
@swift-ci please test |
func testSubCharacterSlicing() { | ||
// This test validates that DiscontiguousAttributedSubstring.characters (i.e. DiscontiguousSlice<AttributedString.CharacterView>) behaves the same as DiscontiguousSlice<String> w.r.t. slicing on sub-character boundaries | ||
// Note: Country flag emojis are excluded from the text below because AttributedString.CharacterView is not self-slicing and Slice<AttributedString.CharacterView> does not properly handle sub-character slicing within country flag emojis | ||
let complexString = "ABCDáëĩoō№Ωאבג🎺🌈WXYZ" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string appears to have no multi-scalar Characters -- neither (1) in its input string, nor (2) in the slices it generates. There are no "sub-character boundaries", so the test is not validating them -- what is the point of this test, then? Ideally we should verify behavior in both of these cases.
Regarding point (1): I expect that the flag emoji problem applies to all multi-scalar characters in the input string: a\u{301}
, 집
, 🤞🏾
, etc. Given that there will be no easy way to avoid slicing causing crashes, this greatly limits the usefulness of attributed string slices, although hopefully it doesn't entirely eliminate it. (Note that UI-level text editing operations do not use the same grapheme breaking algorithms that Swift implements, so users are sometimes able to position cursors and select text on what Swift considers to be sub-character positions.) Solving this is probably outside the scope of this change; it may indeed be best to treat it as a temporary limitation.
However, point (2) above is independent of this. The issue with CharacterView
incorrectly using Slice
for its SubSequence
does not (immediately) affect slicing that produces multi-scalar characters within the result, though -- so this test can carry some weight by exercising that at least. For example, take this input string:
let str = "a\n\u{301}"
str
has three single-scalar Characters, a
, \n
, and \u{301}
(the newline does not combine with the combining accent following it). Cutting the a
and \u{301}
off into a discontiguous slice produces a result that technically should (according to Unicode) consist of a single two-scalar Character, á
. We expect that both String
and AttributedString
would handle this slicing without traps/crashes, although we expect them to both count the ASCII a
and the combining accent as distinct Character
s. Can we use this test to verify that the two types produce consistent behavior in this second case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, you're right on both counts. Indeed any slicing within a grapheme cluster causes the same crash as I experienced with the country flag emojis. For point (1) I agree that's definitely something we should look into soon as a followup, my hope is that a temporary limitation is ok as this limitation has existed without much uproar for noncontiguous slicing since AttributedString
's inception. However we should definitely put a plan in place to resolve it so it doesn't continue to linger.
For point (2) that's a great point, and I've added a unit test in place of this one that asserts the behavior of String
is the same as AttributedString.CharacterView
in this regard to make sure that the behavior is the same (and indeed you are correct that both count the ASCII a
and the combining accent as distinct Character
s.
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@swift-ci please test |
@swift-ci please test |
This PR implements the API changes proposed in SF-0014 by adding a new
DiscontiguousAttributedSubstring
type (the result of slicing anAttributedString
with aRangeSet
) which vends out slices of each underlying view, an upgradedRuns
view which now handles discontiguous segments, and performs operations over the discontiguous subranges.