-
Notifications
You must be signed in to change notification settings - Fork 342
Replace hand-rolled SharedMutex with C++17 std::shared_mutex (NFCI) #7120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace hand-rolled SharedMutex with C++17 std::shared_mutex (NFCI) #7120
Conversation
We are getting reports where LLDB is hanging in try_lock(). This patch makes the code easier to reason about by using standard primitives, and avoids the pitfall of the copy constructor that the old code was still allowing. While this may not fix the hangs, it should certainly make it easier to diagnose. rdar://112025620
@swift-ci test |
LLVM's |
According to the comment in RWMutex.h std::shared_timed_mutex is only available on macOS 10.12 and later. Green dragon is running on 10.15.5. Note that |
Guess who wrote that comment. 🙃 Also, I assume you mean Swift CI? If both GreenDragon and Swift CI are using 10.15 as the minimum deployment target I'll remove that workaround upstream.
👍 |
ci.swift.org claims to use |
I just realized that the "host os" != "deployment target". Green dragon uses |
Actually, my memory was correct, but it seems someone hoisted |
lldb/source/Target/Target.cpp
Outdated
[this] { GetSwiftScratchContextLock().unlock(); }); | ||
auto &lock = GetSwiftScratchContextLock(); | ||
if (lock.try_lock()) { | ||
auto unlock = llvm::make_scope_exit([&lock] { lock.unlock(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this use std::lock_guard
+ std::adopt_lock
?
auto unlock = llvm::make_scope_exit([&lock] { lock.unlock(); }); | |
std::lock_guard<std::shared_mutex> unlock(lock, std::adopt_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
TypeSystemSwiftTypeRefForExpressions *operator->() { return get(); } | ||
TypeSystemSwiftTypeRefForExpressions &operator*() { return *get(); } | ||
}; | ||
|
||
/// An RAII object that just acquires the reader lock. | ||
struct SwiftScratchContextLock : ScopedSharedMutexReader { | ||
struct SwiftScratchContextLock { | ||
std::shared_lock<std::shared_mutex> lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe m_lock
for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the lldb coding style uses m_
only in classes, not in structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
The reader lock must be acquired before retrieving a scratch context, otherwise another thread could still tear it down between the time it was created and the creation of the reader object. There are other shortcomings in GetSwiftScratchContext() that are documented, but not addressed, in this patch to make it easier to review.
@kastiglione @JDevlieghere I added another commit on top that actually fixes a concurrency issue. |
@swift-ci test |
auto type_system_or_err = | ||
GetScratchTypeSystemForLanguage(eLanguageTypeSwift, false); | ||
if (!type_system_or_err) { | ||
llvm::consumeError(type_system_or_err.takeError()); | ||
return nullptr; | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could log this error instead of silently consuming it.
log->Printf("returned project-wide scratch context\n"); | ||
llvm::Optional<SwiftScratchContextReader> reader; | ||
if (lldb_module && m_use_scratch_typesystem_per_module) { | ||
maybe_create_fallback_context(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain one more time why this can't potentially delete the scratch context while it's being used?
In my mind the following events could happen:
- Thread 1 successfully calls
GetSwiftScratchContext
and creates a new scratch type system. - Thread 2 calls
GetSwiftScratchContext
- Thread 2 calls
maybe_create_fallback_context
(this call is not guarded by any locks as far as I can tell).maybe_create_fallback_context
deletes the scratch type system.
- Thread 2 calls
- Thread 1 is now using a dangling reference to the scratch type system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this by only deleting when we can acquire the write lock.
if (log) | ||
log->Printf("erased module-wide scratch context with errors\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLDB_LOG
llvm::Optional<SwiftScratchContextReader> reader; | ||
if (lldb_module && m_use_scratch_typesystem_per_module) { | ||
maybe_create_fallback_context(); | ||
std::shared_lock<std::shared_mutex> lock(GetSwiftScratchContextLock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to create a reader object from the get_or_create object and make the whole operation atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, no. AFAIK, there is no way to "downgrade" a unique lock into a shared lock, so we need to reacquire it. That means we also need to do the cache lookup again, because another thread could have swapped it out in the mean time by calling the same function.
@@ -2431,7 +2431,7 @@ Target::GetScratchTypeSystemForLanguage(lldb::LanguageType language, | |||
// replacing it could cause a use-after-free later on. | |||
auto &lock = GetSwiftScratchContextLock(); | |||
if (lock.try_lock()) { | |||
auto unlock = llvm::make_scope_exit([&lock] { lock.unlock(); }); | |||
std::lock_guard<std::shared_mutex> unlock(lock, std::adopt_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
class SwiftScratchContextReader : ScopedSharedMutexReader { | ||
TypeSystemSwiftTypeRefForExpressions *m_ptr; | ||
class SwiftScratchContextReader { | ||
std::shared_lock<std::shared_mutex> m_lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the locks of the shared mutex are locked by std::shared_lock
(none of them use std::unique_lock
). Doesn't this mean that all of them can use the type system concurrently freely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The lock is not meant to ensure exclusive access, but guard against a context from being destroyed while still being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, you would have to release the unique_lock and acquire the shared_lock, which has the same drawback as the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@swift-ci test windows |
@swift-ci test |
@swift-ci test windows |
We are getting reports where LLDB is hanging in try_lock(). This patch makes the code easier to reason about by using standard primitives, and avoids the pitfall of the copy constructor that the old code was still allowing. While this may not fix the hangs, it should certainly make it easier to diagnose.
rdar://112025620