-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Make NSString(contentsOf:usedEncoding:) tests work on macOS #1149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks - I'll give this a test here. |
Confirmed that the failing tests now pass for me, running |
@swift-ci please test |
@ianpartridge Does it seems correct to you that the first character of the string would be the BOM? I thought that once it had been parsed by the underlying library the |
I think what's happening is that Swift just treats the first character of the string as the codepoint that represents a BOM, not a BOM itself? let str: String = "\u{FEFF}NSString"
print(str.count)
print(str)
print(Array(str))
print(Array(str.unicodeScalars))
print(Array(str.utf8))
print(Array(str.utf16)) prints
I don't know what the Unicode specification says about this - @milseman would know for sure. |
If I am reading http://www.unicode.org/versions/Unicode10.0.0/ch23.pdf#G19635 correctly, the relevant bit says:
So I think Swift is doing the right thing here by just treating the "BOM" as a Unicode scalar with the semantics of a "zero width no-break space". |
I just did a test on latest snapshow, linux v macOS
I think this could cause problems later on down the road, Im just not sure which one is wrong or if Linux is just being more generous in allowing it. |
This might be caused by different underlying versions of ICU being used? |
Ah yes, you could be right about that |
I don't think it is OK. I think it's a bug on Linux. |
@ianpartridge I think you may be right and I suspect the bug is lower down in |
I'm not sure this PR is a fix... Shouldn't |
- Add the NSString-UTF32-{LE,BE}-data.txt tests to the copy resources. - Skip BOM header when passing data to CFStringCreateWithBytes().
0507518
to
abe7693
Compare
@ianpartridge You are correct, if passing encoding as utf{16,32} the BOM is examined and skipped. If passing utf{16,32}{LE, BE} then its assumed there is no BOM and it needs to be skipped manually. Ive added the skip and it now works on Xcode9b4 |
This makes sense to me - let's try Linux 🙂 |
@swift-ci please test |
I tested this PR on Xcode myself, and confirmed the tests all now pass 🍾 |
P.S. I think this means that the Linux tests were only passing because of the other bug you found: Pure luck! |
Yes indeed, I think there is still a unicode issue/difference between macOS and Linux but that can be look at seperately. |
Add the NSString-UTF32-{LE,BE}-data.txt tests to the copy resources.
Add the BOM marker 0xFEFF to the start of the comparison string.
Im not sure if it is correct to have the BOM as the first character in a string but with this change the tests now work on macOS and still work ok on Linux.
string.character(at: 0) returns the BOM on macOS, does that sound correct?
The UTF32 changes were merged in #928