Skip to content

[string] Wean off of no-longer-supported String non-API #1484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 26, 2018

Conversation

milseman
Copy link
Member

String's _core disappeared, but we kept around a mock-up of it as a
faux-SPI to avoid breaking Foundation. However, it does not fully
model String, and the illusion is broken from small string support.

@milseman
Copy link
Member Author

@swift-ci please test

1 similar comment
@milseman
Copy link
Member Author

@swift-ci please test

@milseman milseman requested a review from itaiferber March 20, 2018 20:49
@milseman
Copy link
Member Author

@itaiferber, this code makes incorrect assumptions about isASCII checks, namely that isASCII also implies the presence of contiguous ASCII in memory (as opposed to small strings) and that !isASCII implies the presence of contiguous UTF-16 in memory.

Do you have any idea why these SPI are even here? Do they matter? If so, would you prefer a real API and/or SPI instead?

Could you also take a look at the "SQUASH ME" fix, and let me know how best to stop using isASCII there? It seems like a default of false is actively harmful here.

@milseman
Copy link
Member Author

@swift-ci please test

@parkera
Copy link
Contributor

parkera commented Mar 21, 2018

Do we have any replacement available? This is going to be a big performance loss.

@itaiferber
Copy link
Contributor

itaiferber commented Mar 21, 2018

@milseman I meant to respond to this yesterday but I was OOO. I believe @phausler would know best about why the assumptions being made are made the way they are. I agree with Tony though; this is an important fast-path

@airspeedswift
Copy link
Member

this is an important fast-path

I just want to understand what you mean here. There are two interpretations that I see:

  1. It's an important fast path to provide to users. That is, people using corelibs-foundation on linux are using _fastCStringContents in their code, today. If so, it shouldn't be underscored, but more importantly, it's really bad that we ever added this this because we've basically offered up a back-door into String's internal implementation for corelibs users and we need to remove this API ASAP.

  2. It's only used for internal implementation of corelibs functions, which is why it's underscored, and where we are confident of the lifetime of the pointer. Really we shouldn't have taken this implementation shortcut because non-stdlib code shouldn't be relying on String internals. The fix is go understand all of the uses, what the perf impact of implementing them using string's correct API would be, and whether that perf impact is tolerable in the short term while we figure out a proper public APIs to use (and possibly some performance tweaks in the short term to the existing APIs to make the uses faster).

@Catfish-Man
Copy link
Contributor

_fastCStringContents is specifically for CF bridging; CFStringGetCStringPtr is the fast path for accessing CFString contents, which is quite a hot path. There’s a years-old trail of requests from our team for a better way to do this with Swift string, e.g. 26236614.

@itaiferber
Copy link
Contributor

@airspeedswift The second — no one externally should be using _fastCStringContents, same as on Darwin. We use _fastCStringContents as a fast-path internally in many places on Darwin, so the idea would be to enable the same optimizations here. If this was unsafe, I agree that we should find out what this was doing, and either try to enable it through the correct API, or learn why it was wrong.

But again, I didn't write this, so I don't know the specifics. @phausler would know best.

@airspeedswift
Copy link
Member

The second — no one externally should be using _fastCStringContents

OK great that's a relief! So here's how I'd suggest we proceed: we know that we want to give String a similar API, but with closure-based scoping in keeping with the usual practice. Something like this:

extension String { // or probably rather StringProtocol
  /// If the string is held in a form that can provide a null-terminated buffer of UTF8 code
  /// units in O(1), pass that into the supplied closure, and return the result. If not, return `nil`
 func withUnsafeUTF8<Result>(_ body: (UnsafeBufferPointer<UInt8>) throws -> Result) rethrows -> Result?

  /// If the string is held in a form that can provide a null-terminated buffer of UTF16 code
  /// units in O(1), pass that into the supplied closure, and return the result. If not, return `nil`
 func withUnsafeUTF16<Result>(_ body: (UnsafeBufferPointer<UInt16>) throws -> Result) rethrows -> Result?
}

With the current string internals this should be straightforward to implement, but we need to get it through evolution (bike shed if it should be withUnsafeUTF8IfAvailable etc). In the short-term, we can provide this as _withUnsafeUTF8 to wean corelibs off _core, and if the result of the evolution discussion is a different API, we migrate to that once available.

@milseman
Copy link
Member Author

@Catfish-Man Sure, but why is this important on Linux? Or does corelibs-foundation need to bridge to CFString to use CoreFoundation to implement NSString?

Assuming we underscore them, @parkera @itaiferber @Catfish-Man, do these two closure-based SPIs work for you?

@phausler
Copy link
Contributor

So the implementation of NSString is primarily used to be the bridge to CFString even in swift-corelibs-foundation on linux. This is the conduit in which constant strings and property list encoding and the such are exposed outwardly to Foundation APIs. So in short: yes it is used on Linux, and I don't think we can avoid _fastCStringContents being called since that is part of the CF source (which is shared with Darwin and would require massive changes to avoid that).

This will be a pretty harsh perf hit to the linux serialization.

@milseman
Copy link
Member Author

@phausler how do these 2 SPI look to you? Note that escaping such pointers violates Swift's programming model.

@Catfish-Man
Copy link
Contributor

How would a non-escaping SPI work for CFStringGetCStringPtr?

@milseman
Copy link
Member Author

That was the point of @airspeedswift's question about whether this is important for the implementation of Foundation, or for exposing to users of Foundation API. In this example, perhaps he should of asked whether the result of _fastCStringContents escapes from Foundation's internal uses, rather than a direct call.

Foundation-internal uses of CFStringGetCStringPtr could ideally be migrated onto something scoped, or worst-case we allow Foundation to violate Swift's model because "it just works and we'll change the implementation when it doesn't". This approach doesn't work when an escaped pointer is important API.

Why would Linux user-code be heavily reliant on CFStringGetCStringPtr? If String provided this kind of access, but correctly scoped to not violate Swift's model, then users would just use that.

@parkera
Copy link
Contributor

parkera commented Mar 22, 2018

Here is one of many examples:

https://github.com/apple/swift-corelibs-foundation/blob/master/CoreFoundation/Parsing.subproj/CFPropertyList.c#L350

When writing out a property list, if we have an ASCII string, then there is no need to copy the string into a temporary buffer so we can then copy it again into the data buffer for the property list we're writing.

If CFStringGetCStringPtr always fails, we will always call CFStringCreateExternalRepresentation which is necessarily a malloc of CFData, which we then copy out of and free.

@parkera
Copy link
Contributor

parkera commented Mar 22, 2018

I understand the scoped access API you're going towards, but it's a poor fit for the existing C code, which is shared across all platforms.

@parkera
Copy link
Contributor

parkera commented Mar 22, 2018

I think in order to make use of it we'd have to add a new C API which took a closure itself and replace all CFStringGetCStringPtr usage in CF with this new call everywhere. That would probably add some overhead vs CFStringGetCStringPtrsince we'd have to call out an additional frame. I'm not sure if that's enough to make a performance difference, but it will certainly be a usability degradation in C at least.

@parkera
Copy link
Contributor

parkera commented Mar 22, 2018

I guess it would be nice if we had a mechanism to allow a pointer to be valid until end of scope, like the return inner pointer feature on Darwin -- although that of course relies on autorelease pools.

@milseman
Copy link
Member Author

We do have lifetimes and there's some interest (CC @atrick) in better dependent lifetime support. For now, we do this with defer { _fixLifetime(x) } (see the entry in the programmer's manual).

However, correct usage of that also requires reasoning about any surrounding effects. For example, such a pointer would not be considered a use for COW, so a String append could deallocate the memory the pointer is pointing at, even if the string value is kept alive.

A short term solution might be that we trust Foundation to properly reason through effects and extend lifetimes, and we ... erm... hope that user code does as well?

@Catfish-Man
Copy link
Contributor

CF is at least expected to be fairly sharp-edged 🤷‍♂️

@milseman
Copy link
Member Author

Wait a second, isn't this code just wrong (and has always been)? The code doesn't check for nul-termination, but CFStringGetCStringPtr returns a pointer that (I'm guessing) should be nul terminated. Even if you escape an internal pointer, you don't know whether it's nul-terminated or not.

@parkera
Copy link
Contributor

parkera commented Mar 23, 2018

How likely is it that we would get the lifetime-encapsulated-closure version of that string API along with these other changes?

@milseman
Copy link
Member Author

If you mean the SPI that @airspeedswift mentioned, I merged it this morning at swiftlang/swift#15442. Guaranteeing lifetime is up to the caller, there is no way for the callee to sensibly do so.

Making this API would require clarifying behavior surrounding small strings. One approach is, if small, spill to the stack and execute the closure on that. However, if you're planning on escaping the pointer, then that'd be a dangling pointer into an old stack frame.

@milseman
Copy link
Member Author

Hmm, how does one productively develop corelibs-foundation on Mac while using recent changes? It seems like my options are:

  1. Wait until there's a public snapshot available for download/install (i.e. a week or two).
  2. Build a toolchain locally and figure out how to install that (long time and often fails hours in).
  3. Just use Linux for everything.

Is there a fourth option?

@millenomi
Copy link
Contributor

(2) and a little patience are what you want, generally. I've found the release build to be a lot faster than the debug build, if the changes on the stdlib side look solid.

@millenomi
Copy link
Contributor

(This may be more appropriate for forums than in a patch discussion.)

Switch off of old _StringCore (which no longer exists)'s isASCII SPI
to a check whether the String explicitly stores contiguous ASCII, as
opposed to small or non-contiguous contents.
@milseman
Copy link
Member Author

@jrose-apple argues that escaping the pointer is more bad rather than less bad. Debating this point further is not useful.

I've changed this to just use less bad queries, namely _guts._isContiguousASCII / UTF16. We can figure out the rest later.

@milseman
Copy link
Member Author

@swift-ci please test

1 similar comment
@milseman
Copy link
Member Author

@swift-ci please test

@milseman
Copy link
Member Author

@parkera @itaiferber @millenomi @phausler @Catfish-Man what do you think of this far more minimal change? This should be sufficient (for now) to ensure correct behavior with small strings.

Copy link
Contributor

@phausler phausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good minimal change to me

@milseman milseman merged commit 5aa9421 into swiftlang:master Mar 26, 2018
@milseman milseman deleted the string_cleaning branch March 26, 2018 01:49
spevans pushed a commit to spevans/swift-corelibs-foundation that referenced this pull request Mar 31, 2018
Switch off of old _StringCore (which no longer exists)'s isASCII SPI
to a check whether the String explicitly stores contiguous ASCII, as
opposed to small or non-contiguous contents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants