Skip to content

Implement hasPrefix and hasSuffix #440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

glessard
Copy link
Contributor

Using the _compareDeterministicUnicodeCollation method, also used by the == operator.

@gribozavr gribozavr assigned AnnaZaks and unassigned AnnaZaks Dec 11, 2015
@gribozavr
Copy link
Contributor

I think this might be close enough, although inefficient (traverses both strings twice).

This change requires tests. We have tests for the Objective-C side in test/1_stdlib/NSStringAPI.swift and validation-test/stdlib/String.swift, please port those to start.

It would make sense to put the tests into validation-test/stdlib/String.swift, using #if _runtime(ObjC) to separate Objective-C dependent parts and un-XFAIL the test on Linux.

@glessard
Copy link
Contributor Author

On 11 déc. 2015, at 16:30, Dmitri Gribenko [email protected] wrote:

I think this might be close enough, although inefficient (traverses both strings twice).

There’s no claim of efficiency implied, but it could be worse; I’m fairly sure no new memory is allocated.

This change requires tests. We have tests for the Objective-C side in test/1_stdlib/NSStringAPI.swift and validation-test/stdlib/String.swift, please port those to start.

It would make sense to put the tests into validation-test/stdlib/String.swift, using #if _runtime(ObjC) to separate Objective-C dependent parts and un-XFAIL the test on Linux.

Thanks for pointing me to those.

Guillaume Lessard

@parkera
Copy link
Contributor

parkera commented Dec 12, 2015

We should be careful with "close enough" implementations because the expectation is that it will behave the same as the Darwin version.

@parkera
Copy link
Contributor

parkera commented Dec 12, 2015

We have Foundation available on all platforms now. Can we make this use that implementation on all platforms as well?

@gribozavr
Copy link
Contributor

@parkera That will create interesting circular dependencies in the build process.

@parkera
Copy link
Contributor

parkera commented Dec 12, 2015

If the build process is deficient then we should fix that instead of forcing an incompatibility onto clients of the API.

@parkera
Copy link
Contributor

parkera commented Dec 12, 2015

Or we do the obvious thing and put the implementation in an extension on String in Foundation.

// FIXME: Implement hasPrefix and hasSuffix without objc
// rdar://problem/18878343
extension String {
/// Returns `true` iff `self` begins with `prefix`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be just "if" and the same thing below :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iff" (if-and-only-if) is a stronger statement than "if". It is correct here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected. My apologies.

@glessard
Copy link
Contributor Author

@gribozavr The relevant test is ported from test/1_stdlib/NSStringAPI.swift.

Neither the original test nor the ported test are particularly satisfying, since ultimately the operation is tested with a different version of itself.

@glessard
Copy link
Contributor Author

@parkera From what I ran into while doing this, I'd advocate to eliminate any way to reach Foundation-based comparisons from Swift.String. If a variable is being used as NSString or CFString, it's okay to use them, but otherwise there is at least one corner case where the Foundation-based comparison is not consistent with the Swift.String model. See https://bugs.swift.org/browse/SR-243

@parkera
Copy link
Contributor

parkera commented Dec 16, 2015

If we want to discuss dropping the NSString methods from String, that would be something to discuss on the evolution list. In the meantime, my primary concern is making sure developers don't get different answers for the same methods on Linux vs OS X.

@dabrahams
Copy link
Contributor

On Dec 16, 2015, at 10:13 AM, Tony Parker [email protected] wrote:

If we want to discuss dropping the NSString methods from String, that would be something to discuss on the evolution list. In the meantime, my primary concern is making sure developers don't get different answers for the same methods on Linux vs OS X.

+1

-Dave

@glessard
Copy link
Contributor Author

After porting the corrected test from #629, the Foundation-based calls no longer pass, but the ICU-based calls do. Not sure where to go from here.

@glessard
Copy link
Contributor Author

Would you like a consolidated PR with this?

@lilyball
Copy link
Contributor

lilyball commented Jan 4, 2016

Incidentally, it turns out that _compareDeterministicUnicodeCollation actually bridges into ObjC if _runtime(_ObjC). I don't know why it does this at all, unless _compareDeterministicUnicodeCollation doesn't actually match NSString behavior (but you'd think it would). This means that anything that uses _compareDeterministicUnicodeCollation ends up actually allocating a new _NSContiguousString if the string in question isn't already backed by a Cocoa string.

@dabrahams
Copy link
Contributor

On Jan 3, 2016, at 4:00 PM, Kevin Ballard [email protected] wrote:

Incidentally, it turns out that _compareDeterministicUnicodeCollation actually bridges into ObjC if _runtime(_ObjC). I don't know why it does this at all, unless _compareDeterministicUnicodeCollation doesn't actually match NSString behavior (but you'd think it would). This means that anything that uses _compareDeterministicUnicodeCollation ends up actually allocating a new _NSContiguousString if the string in question isn't already backed by a Cocoa string.

Except maybe on Linux, every string is backed by a Cocoa string. The Swift-native String backing buffer is-a NSString.

-Dave

@lilyball
Copy link
Contributor

lilyball commented Jan 4, 2016

@dabrahams I don't see how that's possible. The Swift-native String buffer is a _StringBuffer, which uses a _HeapBuffer<_StringBufferIVars, UTF16.CodeUnit> as its storage, which in turn uses _HeapBufferStorage<_StringBufferIVars, UTF16.CodeUnit>, which is a class that inherits from NonObjectiveCBase. It doesn't seem possible for this to be layout-compatible with an NSString, because Swift classes always have the type info and retain counts at the beginning of their allocated storage (which is precisely where the isa pointer needs to go for Obj-C). Plus of course _StringBufferIVars doesn't even attempt to include an isa pointer.

You can also trivially demonstrate that a Swift string is not Cocoa-compatible with the following code:

var s = "test"
s += "!" // force it to copy to mutable memory
print(s._core.nativeBuffer)
print(s._core.cocoaBuffer)

If it was a Cocoa string, it would print nil and then the Cocoa string. But it's not, it's a native string, so it prints the _StringBuffer and then it prints nil. And since _stdlib_binary_bridgeToObjectiveCImpl only avoids creating a new _NSContiguousString if _core.cocoaBuffer is non-nil (and _swift_stdlib_CFStringGetLength reports the same length as _core.count, not entirely sure why), this should demonstrate that Swift-native strings always end up creating a new _NSContiguousString whenever they bridge into Obj-C.

So this all comes back to the question of why String bridges into Obj-C when it has a perfectly functional non-ObjC implementation available. It would make sense to bridge if it already has a Cocoa buffer, but it doesn't test that, it merely tests #if _runtime(_ObjC).

Incidentally, there's a bit of even more weirdness here. When comparing two strings with _runtime(_ObjC), if both strings are ASCII, it falls back to a separate non-ObjC comparison routine that's basically a wrapper around memcmp. The documentation says that Unicode actually defines the sort order differently for some punctuation than the ASCII value order, but that Foundation uses the ASCII value order. This is strange because it suggests that two Swift strings can actually compare differently depending on whether the ObjC runtime is present (e.g. "$" < "*" prints true with the Obj-C runtime but it should print false with the root collation order from Unicode). What's even odder about this is the Unicode Collation Algorithm can actually be tailored, and in fact the en_US_POSIX locale (also known as the C locale) already tailors the collation specifically to make ASCII characters sort in ASCII value order. So I'm surprised we don't just use an ICU collator with the collation table tailored to make ASCII characters sort in ASCII order the same way en_US_POSIX does (I assume en_US_POSIX inherits other collation tweaks from en_US or else I'd suggest just using en_US_POSIX directly, although I don't understand ICU well enough to know if that's actually true).

@glessard
Copy link
Contributor Author

glessard commented Jan 4, 2016

As far as I can tell, the OSX swift build doesn't link ICU at all. I attempted earlier to bypass the call to _stdlib_compareNSStringDeterministicUnicodeCollation within _compareDeterministicUnicodeCollation (by inserting #if false at line 371 of String.swift); that resulted in a link error.

@adrfer
Copy link
Contributor

adrfer commented Jan 4, 2016

@glessard, really nice implementation! 👌

I was just wondering which one is more Swift-y at the hasPrefix function?

if prefix.isEmpty { return false }

or

guard !prefix.isEmpty else { return false }

Any thoughts?

Same goes for hasSuffix.

@glessard
Copy link
Contributor Author

glessard commented Jan 5, 2016

@adrfer I'm sure it could be argued both ways, but I think it is a special condition rather than a precondition. If this eventually called into CFStringFind, the if statement could conceivably be removed; this eventually calls into CFStringCompare, and two empty strings would compare as equal, which is not the desired answer. (Plus, guard leads to a double-negative.)

@adrfer
Copy link
Contributor

adrfer commented Jan 5, 2016

@glessard indeed! I'm just trying to see what's more idiomatic. In this case guard's double negative isn't really cool, and its version is even longer than the if version. Anyways, Swift is evolving fast and looking forward to seeing what preferred style the community will come up with.

@glessard glessard mentioned this pull request Jan 21, 2016
@glessard
Copy link
Contributor Author

Rebased and re-submitted as #1031

@glessard glessard closed this Jan 21, 2016
@lilyball
Copy link
Contributor

You know, you can just force-push to your branch and GitHub will update the PR. That way all the previous discussion will be preserved.

@glessard
Copy link
Contributor Author

Rebased here. Thanks for letting me know this is possible, @kballard

@CodaFi
Copy link
Contributor

CodaFi commented Aug 24, 2016

@glessard Given the current implementation, do you think this pull request is still necessary/relevant?

@glessard
Copy link
Contributor Author

@CodaFi Good question. I was considering re-implementing the tests part of this based on the newer work; it would at least illuminate some differences between an ICU-based implementation and the CFString-based one. I think the discussion is still of interest for the (hopefully) forthcoming String rethink.

@CodaFi
Copy link
Contributor

CodaFi commented Aug 24, 2016

Sounds good; we'll always take more tests. To preserve the discussion here, I'm going to close this pull request. Please open another if you decide to cherry pick the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants