Skip to content

[stdlib] String: Fix forward implementation of grapheme breaking rule 11 #63043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 16, 2023

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Jan 15, 2023

Rule GB11 in Unicode Annex 29 is:

GB11: Extended_Pictographic Extend* ZWJ × Extended_Pictographic

However, our forward grapheme breaking state machine implements it as:

GB11: Extended_Pictographic (Extend | ZWJ)* ZWJ × Extended_Pictographic

We implement the correct rules when going backward, which can cause String values to have different counts whether we’re going forward or back.

The rule as implemented would be fine (Unicode doesn’t care much about the placement of grapheme breaks in invalid sequences), but the directional inconsistency messes with String’s Collection conformance.

rdar://104279671

@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci test

Rule GB11 in Unicode Annex 29 is:

GB11: Extended_Pictographic Extend* ZWJ × Extended_Pictographic

However, our forward grapheme breaking state machine implements it as:

GB11: Extended_Pictographic Extend* ZWJ+ × Extended_Pictographic

We implement the correct rules when going backward, which can cause String values to have different counts whether we’re going forward or back.

The rule as implemented would be fine (Unicode doesn’t care much about the placement of grapheme breaks in invalid sequences), but the directional inconsistency messes with String’s Collection conformance.

rdar://104279671
@lorentey
Copy link
Member Author

@swift-ci test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants