Skip to content

Replace hand-rolled SharedMutex with C++17 std::shared_mutex (NFCI) #7120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 2, 2023

Conversation

adrian-prantl
Copy link

We are getting reports where LLDB is hanging in try_lock(). This patch makes the code easier to reason about by using standard primitives, and avoids the pitfall of the copy constructor that the old code was still allowing. While this may not fix the hangs, it should certainly make it easier to diagnose.

rdar://112025620

We are getting reports where LLDB is hanging in try_lock().  This
patch makes the code easier to reason about by using standard
primitives, and avoids the pitfall of the copy constructor that the
old code was still allowing. While this may not fix the hangs, it
should certainly make it easier to diagnose.

rdar://112025620
@adrian-prantl
Copy link
Author

@swift-ci test

@JDevlieghere
Copy link

LLVM's RWMutex uses shared_mutex if the deployment target allows it. Either this is a NOOP or it has the potential to break bots with a lower deployment target.

@adrian-prantl
Copy link
Author

LLVM's RWMutex uses shared_mutex if the deployment target allows it. Either this is a NOOP or it has the potential to break bots with a lower deployment target.

According to the comment in RWMutex.h std::shared_timed_mutex is only available on macOS 10.12 and later. Green dragon is running on 10.15.5.

Note that llvm::RWMutex does not have a try_lock method and it's my hand-rolled implementation of that, that I would like to remove with this patch.

@JDevlieghere
Copy link

JDevlieghere commented Aug 1, 2023

According to the comment in RWMutex.h std::shared_timed_mutex is only available on macOS 10.12 and later. Green dragon is running on 10.15.5.

Guess who wrote that comment. 🙃 Also, I assume you mean Swift CI? If both GreenDragon and Swift CI are using 10.15 as the minimum deployment target I'll remove that workaround upstream.

Note that llvm::RWMutex does not have a try_lock method and it's my hand-rolled implementation of that, that I would like to remove with this patch.

👍

@adrian-prantl
Copy link
Author

ci.swift.org claims to use Host OS: macOS 12.6.

@adrian-prantl
Copy link
Author

I just realized that the "host os" != "deployment target". Green dragon uses x86_64-apple-darwin20.6.0 (macOS 11) and ci.swift.org seems to build for x86_64-apple-macosx10.13.
So we should be good here.

@JDevlieghere
Copy link

JDevlieghere commented Aug 1, 2023

Actually, my memory was correct, but it seems someone hoisted shared_mutex out of the availability check. Seems like Google is still running a macOS bot that targets 10.12 (discussed in https://reviews.llvm.org/D130689) so let's keep the workaround in place for shared_timed_mutex upstream. That doesn't affect this patch.

[this] { GetSwiftScratchContextLock().unlock(); });
auto &lock = GetSwiftScratchContextLock();
if (lock.try_lock()) {
auto unlock = llvm::make_scope_exit([&lock] { lock.unlock(); });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this use std::lock_guard + std::adopt_lock?

Suggested change
auto unlock = llvm::make_scope_exit([&lock] { lock.unlock(); });
std::lock_guard<std::shared_mutex> unlock(lock, std::adopt_lock);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

TypeSystemSwiftTypeRefForExpressions *operator->() { return get(); }
TypeSystemSwiftTypeRefForExpressions &operator*() { return *get(); }
};

/// An RAII object that just acquires the reader lock.
struct SwiftScratchContextLock : ScopedSharedMutexReader {
struct SwiftScratchContextLock {
std::shared_lock<std::shared_mutex> lock;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe m_lock for consistency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the lldb coding style uses m_ only in classes, not in structs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

The reader lock must be acquired before retrieving a scratch context,
otherwise another thread could still tear it down between the time it
was created and the creation of the reader object.

There are other shortcomings in GetSwiftScratchContext() that are
documented, but not addressed, in this patch to make it easier to review.
@adrian-prantl
Copy link
Author

@kastiglione @JDevlieghere I added another commit on top that actually fixes a concurrency issue.

@adrian-prantl
Copy link
Author

@swift-ci test

auto type_system_or_err =
GetScratchTypeSystemForLanguage(eLanguageTypeSwift, false);
if (!type_system_or_err) {
llvm::consumeError(type_system_or_err.takeError());
return nullptr;
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could log this error instead of silently consuming it.

log->Printf("returned project-wide scratch context\n");
llvm::Optional<SwiftScratchContextReader> reader;
if (lldb_module && m_use_scratch_typesystem_per_module) {
maybe_create_fallback_context();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain one more time why this can't potentially delete the scratch context while it's being used?

In my mind the following events could happen:

  • Thread 1 successfully calls GetSwiftScratchContext and creates a new scratch type system.
  • Thread 2 calls GetSwiftScratchContext
    • Thread 2 calls maybe_create_fallback_context (this call is not guarded by any locks as far as I can tell).
      • maybe_create_fallback_context deletes the scratch type system.
  • Thread 1 is now using a dangling reference to the scratch type system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this by only deleting when we can acquire the write lock.

Comment on lines +2741 to +2742
if (log)
log->Printf("erased module-wide scratch context with errors\n");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLDB_LOG

llvm::Optional<SwiftScratchContextReader> reader;
if (lldb_module && m_use_scratch_typesystem_per_module) {
maybe_create_fallback_context();
std::shared_lock<std::shared_mutex> lock(GetSwiftScratchContextLock());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to create a reader object from the get_or_create object and make the whole operation atomic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. AFAIK, there is no way to "downgrade" a unique lock into a shared lock, so we need to reacquire it. That means we also need to do the cache lookup again, because another thread could have swapped it out in the mean time by calling the same function.

@@ -2431,7 +2431,7 @@ Target::GetScratchTypeSystemForLanguage(lldb::LanguageType language,
// replacing it could cause a use-after-free later on.
auto &lock = GetSwiftScratchContextLock();
if (lock.try_lock()) {
auto unlock = llvm::make_scope_exit([&lock] { lock.unlock(); });
std::lock_guard<std::shared_mutex> unlock(lock, std::adopt_lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

class SwiftScratchContextReader : ScopedSharedMutexReader {
TypeSystemSwiftTypeRefForExpressions *m_ptr;
class SwiftScratchContextReader {
std::shared_lock<std::shared_mutex> m_lock;
Copy link

@augusto2112 augusto2112 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the locks of the shared mutex are locked by std::shared_lock (none of them use std::unique_lock). Doesn't this mean that all of them can use the type system concurrently freely?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The lock is not meant to ensure exclusive access, but guard against a context from being destroyed while still being used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, you would have to release the unique_lock and acquire the shared_lock, which has the same drawback as the current approach.

Copy link

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adrian-prantl
Copy link
Author

@swift-ci test windows

@adrian-prantl
Copy link
Author

@swift-ci test

@adrian-prantl
Copy link
Author

@swift-ci test windows

@adrian-prantl adrian-prantl merged commit fbc2eaa into swiftlang:swift/release/5.9 Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants