-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[DNM] [stdlib] Implement string case-folding and normalization APIs #17933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/cc @allevato Still in the exploratory stages, but would love your thoughts. I would imagine you've already devoted some thought to this area. |
This is a funny coincidence—I was thinking about case folding just yesterday when I pushed #17923 to correct/clean up some of the docs for scalar properties. I realized that I could refer to So, I was mainly only thinking about it in terms of individual scalars, and whether or not we should have I like and agree with your analysis and rationale of the current state of the world in other languages that provide this API—I don't do much Python, so I was surprised to find that Python 3 actually exposed My only concern (with any new API) would be "is this an operation that should be publicly exposed, or is it a lower-level piece that should be used by higher-level algorithms like a case-insensitive comparison but not available on its own?" But your Punycode example is a good use case that wouldn't be satisfied by the latter, so I think it does provide value on its own. Indeed, even |
@allevato Yes, I think Python is a good example of a principled approach to exposing useful Unicode facilities; it exposes not much more than |
@swift-ci please smoke test |
@swift-ci please smoke test |
@swift-ci please smoke test |
Another contributor made some changes to parse domains properly in cookies. In thinking about that PR, I went down the rabbit hole:
International domains should be punycode-encoded, which is implementable in Swift. But wait! They should first be case-folded, normalized by NFKC, and sanitized in other ways, and we can't do that in Swift without exposing some ICU interfaces.
Exposing such facilities for strings is well precedented in other languages. For example, C#, JavaScript, Perl, and Python all offer Unicode normalization functions.
Some notes on the design of the proposed API
See the draft proposal text.