-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Avoid attempting to create SmallStrings for constant tagged CFStrings #30966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
@swift-ci please benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Excellent. Moving the slow path out of line for isEqual/isEqualToString didn't hurt anything :) |
unsigned long flags; | ||
const __swift_uint8_t * _Nonnull str; | ||
unsigned long length; | ||
} _swift_shims_builtin_CFString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter that this type is mismatched on Linux? There are three different ABIs in play :-(:
Default (ObjC) (-fcf-runtime-abi=objc
):
struct __NSConstantString_tag {
const int *isa;
int flags;
const char *str;
long length;
} __NSConstantString;
Swift 4.1, 4.2 (-fcfc-runtime-abi=swift-4.1
, -fcf-runtime-abi=swift-4.2
):
typedef struct __NSConstantString_tag {
uintptr_t _cfisa;
uintptr_t _swift_rc;
_Atomic(uint64_t) _cfinfoa;
const char *_ptr;
uint32_t _length;
} __NSConstantString;
Swift 5.0 (-fcf-runtime=swift-5.0
):
typedef struct __NSConstantString_tag {
uintptr_t _cfisa;
uintptr_t _swift_rc;
_Atomic(uint64_t) _cfinfoa;
const char *_ptr;
uintptr_t _length;
} __NSConstantString;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only intended for Darwin, but maybe there should be a guard here to make it explicit and prevent anyone from mistakenly using it more widely in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the CF shims file is already only built on Darwin, but thank you for pointing this out, it's good to be aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to be Darwin only, I think that we should do something to make that explicit (perhaps move the file) - as it is included in the Linux/Android/Windows SDKs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most requesting comments, simplifications, and clarifications
@@ -288,13 +288,24 @@ internal enum _KnownCocoaString { | |||
#if !(arch(i386) || arch(arm) || arch(wasm32)) | |||
case tagged | |||
#endif | |||
#if arch(arm64) | |||
case constantTagged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that big a deal that this is separated out as only arm64? Or more broadly, do we want to diverge the code cross any platforms in the enum definition itself, or just at the create/use site? Even more broadly, do we want to diverge at the use site or only at the creation site?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to rely on the x86_64 tagged pointer format not accidentally overlapping the values for these, so I think it is unfortunately necessary for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why is that needed for the declaration of the type, in contrast to the code that chooses what case to construct?
@@ -288,13 +288,24 @@ internal enum _KnownCocoaString { | |||
#if !(arch(i386) || arch(arm) || arch(wasm32)) | |||
case tagged | |||
#endif | |||
#if arch(arm64) | |||
case constantTagged | |||
#endif | |||
|
|||
@inline(__always) | |||
init(_ str: _CocoaString) { | |||
|
|||
#if !(arch(i386) || arch(arm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that the wasm32
check isn't consistency enforced.
Maybe we could have a
private var platformSupportsObjCTaggedPointer: Bool {
#if ...
return true
# else
return false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice!
@@ -99,5 +99,10 @@ typedef __swift_uint8_t (*getCStringImplPtr)(id, | |||
|
|||
} | |||
|
|||
const void * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this and what is it for? Could you write a descriptive comment?
otherIsKnownNSString: Bool, | ||
maybeOtherUTF16Length: Int?, | ||
maybeOtherASCIIPointer: UnsafePointer<UInt8>? | ||
) -> Int8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to need way more comments and clarifications. It seems you joined together two code paths with some similarities, but added bool arguments relevant to some callers but not others.
This is a bit of an anti-pattern if you can avoid it. Instead, it seems like there are two very different ways to call this function, so is there a formulation of that that reflects that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be a fallthrough but I can't do two different fallthroughs. I can try to make this nicer, but I do want to keep the optimizations the arguments provide.
return _isEqualSlowPath( | ||
other: other, | ||
otherIsKnownASCII: true, | ||
otherIsKnownNSString: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. why would a tagged Cocoa NSString*
not be a known NSString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone passed the wrong type to -isEqual:. We can only verify that the tag is correct on the new path currently.
@@ -358,6 +369,52 @@ private func _getCocoaStringPointer( | |||
return .none | |||
} | |||
|
|||
#if arch(arm64) | |||
|
|||
let taggedConstantStringsAvailable = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will have storage, is that your intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, for the time being at least :/
|
||
@inline(__always) | ||
internal func getConstantTaggedCocoaContents(_ cocoaString: _CocoaString) -> | ||
(UTF16Length: Int, contentsPointer: UnsafePointer<UInt8>, isASCII: Bool)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need this to be an UnsafeRawPointer, otherwise you're breaking type safety if you're going to have it be UIn8 for ASCII and UInt16 for non-ASCII. Or is it always in a single-byte representation?
Any chance you can reflect this in the type system? Would it makes sense for it to be an enum between a UBP<UInt8>
or UBP<UInt16>
? Is ASCII-ness orthogonal, or is that basically the discriminator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the plan is to ignore it completely for non-ASCII, so I suppose I could just represent it as a single optional pointer instead.
let taggedContents = getConstantTaggedCocoaContents(cocoaString)! | ||
return _StringGuts( | ||
cocoa: cocoaString, | ||
providesFastUTF8: false, //TODO: if contentsPtr is UTF8 compatible, use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the same as isASCII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, yes. I want to change that in the future, but that requires clang and CF changes, and may never happen.
@Catfish-Man could you add tests as well? |
otherIsKnownNSString: true, | ||
maybeOtherUTF16Length: contents.UTF16Length, | ||
maybeOtherASCIIPointer: contents.asciiContentsPointer | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think there's an opportunity to simplify the _isEqualSlowPath
's interface a bit. Having 2 bools and 2 optionals presents 16 ways to call this function, but not all combinations are possible.
For example, you can order the constantTagged
case above .tagged
and drop the otherIsKnownNSString
parameter, since the fall-through shared by .tagged
and .cocoa
will have the _isNSString()
call. I think you can do something similar for the ASCII-ness based on your earlier comment. Also, it seems like the two maybes are not orthogonal, so that's another factor of 2 you might be able to simplify.
edit: Actually, If I'm reading this right, .constantTagged
always knows the length and the other two cases never do. So you can pass utf16Length as a non-optional and the fall-through calls _stdlib_binary_CFStringGetLength
to derive that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new version should make you happier here :)
@swift-ci please test |
@swift-ci please benchmark |
Build failed |
Build failed |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Slightly sad that didn't show up as a win for existing tagged pointer stuff, but eh, I'll take no regression |
@swift-ci please test |
Build failed |
Build failed |
Known random failure apparently |
@swift-ci please test |
Build failed |
@swift-ci please test |
Build failed |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better! My main remaining advice is to sink the architecture checks into getConstantTaggedCocoaContents
so that we don't have to proliferate as many #if
checks.
@swift-ci please test |
Build failed |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the cleanup!
Fixes rdar://problem/61273899