-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Thread-safety for the type metadata cache #31663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You'll probably want something that avoids the recursive mutex, at least in the common case, but this fixes the immediate correctness issue.
@swift-ci Please benchmark |
@swift-ci Please test |
@dabrahams test case? |
@gottesmm I'm trying to reduce it; the one we have is big and involves tensorflow. tsan blames the metadata cache, though. |
Hmm, the cache variable is supposed to be a init once global that is supposed to be thread safe and the Metadata cache itself is supposed to be a concurrent map. Is the issue on the mac? Sounds like we have a bug somewhere. CC @rjmccall |
@dabrahams the TSAN results would be useful too |
in auto *witnessAddr = &((const void* *)wtable)[witnessIndex];
auto witness = *witnessAddr; // <---- here |
Heh, @pschuh reports that my reproducer is tickling a different bug. He's working on reducing our large test case. @aschwaighofer @gottesmm would you like me to file bugs in Jira for these? |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
OK, the real reproducer is attached to https://bugs.swift.org/browse/SR-12760 and the other bug is filed as https://bugs.swift.org/browse/SR-12761 |
@@ -684,6 +687,7 @@ MetadataResponse | |||
swift::swift_getGenericMetadata(MetadataRequest request, | |||
const void * const *arguments, | |||
const TypeContextDescriptor *description) { | |||
std::lock_guard<std::recursive_mutex> guard(metadata_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't just forget to add a lock here. If there's something wrong with our current implementation, we should figure it out, but grabbing a lock at the start of a bunch of runtime functions that are meant to work without locks is not acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjmccall Hey, we know. We're just trying to get the problem on everybody's radar. How these functions are “meant to work” is not immediately obvious from the comments, and I guess some other people around here probably do understand that. If someone would like to explain that, we might be able to attempt an acceptable fix. We won't be insulted either if this PR is closed and replaced with one that is acceptable. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is a pretty aggressive way of reporting a bug.
I opened #31768 for the associated-witness issue, which should fix Parker's bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, man, no offense intended. I thought such a bug in the runtime (which I presume to be a security issue) might be important enough that y'all might be interested in a quick fix. I don't know what you think is aggressive about it. Just trying to be helpful, here. Next time I'll stick to Jira.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No offense taken; I'm just trying to explain that this is not, in fact, a working quick fix.
Please CC me on any concurrency bugs you file; we definitely want to treat them as high-priority.
TSan is right to report about Our metadata usage should actually be fine with the loads being |
Closing in favor of #31768 |
You'll probably want something that avoids the recursive mutex, at least in the common case, but this fixes the immediate correctness issue.