Skip to content

[DNM] [stdlib] Implement string case-folding and normalization APIs #17933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[DNM] [stdlib] Implement string case-folding and normalization APIs #17933

wants to merge 3 commits into from

Conversation

xwu
Copy link
Collaborator

@xwu xwu commented Jul 13, 2018

Another contributor made some changes to parse domains properly in cookies. In thinking about that PR, I went down the rabbit hole:

International domains should be punycode-encoded, which is implementable in Swift. But wait! They should first be case-folded, normalized by NFKC, and sanitized in other ways, and we can't do that in Swift without exposing some ICU interfaces.

Exposing such facilities for strings is well precedented in other languages. For example, C#, JavaScript, Perl, and Python all offer Unicode normalization functions.

Some notes on the design of the proposed API

See the draft proposal text.

@xwu xwu added the swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review label Jul 13, 2018
@xwu
Copy link
Collaborator Author

xwu commented Jul 13, 2018

/cc @allevato Still in the exploratory stages, but would love your thoughts. I would imagine you've already devoted some thought to this area.

@allevato
Copy link
Member

This is a funny coincidence—I was thinking about case folding just yesterday when I pushed #17923 to correct/clean up some of the docs for scalar properties. I realized that I could refer to lowercaseMapping in the docs for changesWhenLowercased and likewise for upper- and titlecase, but that I didn't propose/implement a transformation analogue for case-folding.

So, I was mainly only thinking about it in terms of individual scalars, and whether or not we should have Unicode.Scalar.Properties.caseFoldMapping. (I'm not 100% sure; it would be consistent with the other case mappings but I don't want to add something merely for consistency.)

I like and agree with your analysis and rationale of the current state of the world in other languages that provide this API—I don't do much Python, so I was surprised to find that Python 3 actually exposed casefold() directly!

My only concern (with any new API) would be "is this an operation that should be publicly exposed, or is it a lower-level piece that should be used by higher-level algorithms like a case-insensitive comparison but not available on its own?" But your Punycode example is a good use case that wouldn't be satisfied by the latter, so I think it does provide value on its own. Indeed, even NSString defines folding(options:locale:), so there's precedent to hoist such a function into stdlib (although the underlying CFStringFold function, like the other CF case mappings, is implemented from scratch instead of calling into ICU).

@xwu
Copy link
Collaborator Author

xwu commented Jul 13, 2018

@allevato Yes, I think Python is a good example of a principled approach to exposing useful Unicode facilities; it exposes not much more than casefold and normalize, which I think are justifiable for the use cases we've discussed and likely other Unicode-aware string processing operations. I can agree that everything else is rather more esoteric.

@xwu
Copy link
Collaborator Author

xwu commented Jul 13, 2018

@swift-ci please smoke test

@xwu xwu changed the title [DNM] [stdlib] [WIP] Implement case folding API [DNM] [stdlib] [WIP] Implement string case-folding and normalization API Jul 14, 2018
@xwu
Copy link
Collaborator Author

xwu commented Jul 14, 2018

@swift-ci please smoke test

@xwu xwu changed the title [DNM] [stdlib] [WIP] Implement string case-folding and normalization API [DNM] [stdlib] Implement string case-folding and normalization APIs Jul 14, 2018
@xwu
Copy link
Collaborator Author

xwu commented Jul 14, 2018

@swift-ci please smoke test

@xwu xwu changed the base branch from master to main September 24, 2020 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants