Skip to content

Write string external representation in String.write(to: URL) #3158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 29, 2022

Conversation

karwa
Copy link
Contributor

@karwa karwa commented Mar 19, 2022

NSString._getExternalRepresentation() was not returning the string's external representation 👀.

For some reason, CF decides to only include the BOM in the external representation for encodings using the machine's local endianness (e.g. .utf16, .utf32) but not if you specify an explicit endianness (e.g. .utf32BigEndian). It does the same thing on Darwin Foundation -- I don't know why, and I don't agree with it, and it isn't documented anywhere AFAICT, but there it is.

@karwa
Copy link
Contributor Author

karwa commented Mar 29, 2022

@millenomi Are you able to review this patch, or recommend somebody?

@millenomi
Copy link
Contributor

@swift-ci please test and merge

1 similar comment
@millenomi
Copy link
Contributor

@swift-ci please test and merge

@millenomi
Copy link
Contributor

@swift-ci please test

@karwa
Copy link
Contributor Author

karwa commented Apr 21, 2022

@millenomi Does this look good to go, then?

@karwa
Copy link
Contributor Author

karwa commented Aug 26, 2022

@millenomi Not sure what happened to this. CI appears green but it wasn't merged? 🤷

Copy link
Contributor

@parkera parkera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmschonfeld does this look correct to you?

Copy link
Contributor

@jmschonfeld jmschonfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this looks correct to me. The behavior here which (as you mentioned) is consistent with darwin platforms comes from the Unicode conformance spec which states that the explicit BE/LE encodings should not contain a BOM and the <00 00 FE FF> / <FF FE 00 00> byte sequences should be interpreted as U+FEFF Zero Width No-Break Space (this applies to both the UTF16 and UTF32 variants).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants