Skip to content

Commit 2784508

Browse files
milsemanairspeedswift
authored andcommitted
[4.0][stdlib] Speed up Character construction from CharacterView.subscript (#9252)
This adds a fast path for single-code-unit Character construction. Rather than use the general purpose String based initializer (which then repeats grapheme breaking to ensure a trap, amongst other inefficiencies), just make the Character from the single unicode scalar value directly. This also speeds up simple iteration of BMP strings when the optimizer is unable to eliminate the subscript. Around 2x for ASCII, and around 20% for BMP UTF16.
1 parent f43b398 commit 2784508

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

stdlib/public/core/StringCharacterView.swift

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,22 @@ extension String.CharacterView : BidirectionalCollection {
439439
/// - Parameter position: A valid index of the character view. `position`
440440
/// must be less than the view's end index.
441441
public subscript(i: Index) -> Character {
442+
if i._countUTF16 == 1 {
443+
// For single-code-unit graphemes, we can construct a Character directly
444+
// from a single unicode scalar (if sub-surrogate).
445+
let relativeOffset = i._base._position - _coreOffset
446+
if _core.isASCII {
447+
let asciiBuffer = _core.asciiBuffer._unsafelyUnwrappedUnchecked
448+
return Character(UnicodeScalar(asciiBuffer[relativeOffset]))
449+
} else if _core._baseAddress != nil {
450+
let cu = _core._nthContiguous(relativeOffset)
451+
// Only constructible if sub-surrogate
452+
if (cu < 0xd800) {
453+
return Character(UnicodeScalar(cu)._unsafelyUnwrappedUnchecked)
454+
}
455+
}
456+
}
457+
442458
return Character(String(unicodeScalars[i._base..<i._endBase]))
443459
}
444460
}

0 commit comments

Comments
 (0)