-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[OpenMP][NFC] Remove DelayedBinDesc
#74360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-openmp Author: Johannes Doerfert (jdoerfert) ChangesThe Full diff: https://github.com/llvm/llvm-project/pull/74360.diff 3 Files Affected:
diff --git a/openmp/libomptarget/include/PluginManager.h b/openmp/libomptarget/include/PluginManager.h
index 94ecce01ca74c..2473e39de34bc 100644
--- a/openmp/libomptarget/include/PluginManager.h
+++ b/openmp/libomptarget/include/PluginManager.h
@@ -106,23 +106,6 @@ struct PluginManager {
HostPtrToTableMapTy HostPtrToTableMap;
std::mutex TblMapMtx; ///< For HostPtrToTableMap
- // Work around for plugins that call dlopen on shared libraries that call
- // tgt_register_lib during their initialisation. Stash the pointers in a
- // vector until the plugins are all initialised and then register them.
- bool delayRegisterLib(__tgt_bin_desc *Desc) {
- if (RTLsLoaded)
- return false;
- DelayedBinDesc.push_back(Desc);
- return true;
- }
-
- void registerDelayedLibraries() {
- // Only called by libomptarget constructor
- RTLsLoaded = true;
- for (auto *Desc : DelayedBinDesc)
- __tgt_register_lib(Desc);
- DelayedBinDesc.clear();
- }
int getNumDevices() {
std::lock_guard<decltype(RTLsMtx)> Lock(RTLsMtx);
diff --git a/openmp/libomptarget/src/interface.cpp b/openmp/libomptarget/src/interface.cpp
index 62cf2262deb62..b27410ed12eaf 100644
--- a/openmp/libomptarget/src/interface.cpp
+++ b/openmp/libomptarget/src/interface.cpp
@@ -46,9 +46,6 @@ EXTERN void __tgt_register_requires(int64_t Flags) {
/// adds a target shared library to the target execution image
EXTERN void __tgt_register_lib(__tgt_bin_desc *Desc) {
TIMESCOPE();
- if (PM->delayRegisterLib(Desc))
- return;
-
PM->registerLib(Desc);
}
diff --git a/openmp/libomptarget/src/rtl.cpp b/openmp/libomptarget/src/rtl.cpp
index 5eb1c553df491..27db7034d8956 100644
--- a/openmp/libomptarget/src/rtl.cpp
+++ b/openmp/libomptarget/src/rtl.cpp
@@ -51,7 +51,6 @@ __attribute__((constructor(101))) void init() {
PM->init();
Profiler::get();
- PM->registerDelayedLibraries();
}
__attribute__((destructor(101))) void deinit() {
|
This was the original issue #60119. I think HSA might've solved this as well in future versions. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
This code was from the very beginning either unused or racy. I don't see a point in keeping it. If the issue exists, we need to store delayed lib initialization in a different place. |
I'll wait until @JonChesterfield chimes in, since if I recall this patch did indeed resolve the issue caused by infinitely recursive constructor calls. |
d6dc8b8
to
28482e8
Compare
The part I am deleting did never resolve an issue w/o race or UB. The critical part in the original patch was to move the loadRTLs call earlier. |
IIRC the bug was triggered by HSA's handling of that test case rather than the test itself. HSA was doing some evil binary trawling to search for whether or not it had to stand up debugging utilities or something if memory serves. This ended up causing the binary to re-load its constructors. |
That might be true, but 1) the code in question was, and is, still UB or dead, and 2) the test should have been included. |
I think this code is here because dlopen'ed cuda or hsa otherwise falls over. That's not the default build configuration. Have you tested this with dynamic cuda/hsa? What's the race condition / UB in question? Destruction time is run by one thread I thought. |
The only time we could have called it was after we |
It wasn't dead when we wrote it. It stopped HSA segfaulting when running any openmp programs. The code might be dead now. I expect git log will find a "NFC" patch that changed the control flow so that this no guard no longer works, with no associated tests. Naturally we don't test with dlopen'ed HSA on CI so when it stopped working we didn't notice. As an argument against deleting dynamic_cuda and dynamic_hsa, instead of fixing them, those are the expected mechanisms by which |
Again, it was always dead, UB, or racy. My money is on dead. This is the code in your patch:
There is no point between |
I'm pretty sure libomptarget is racy in general. Initialising the same libomptarget from multiple threads is unlikely to work. Fortunately it is usually initialised by the constructor called from crt. It looks like this was a workaround for HSA doing dubious things. At the time you were insistent that we had to work around HSA, and keep the workaround around so things would work with older versions of HSA once it was fixed. I wanted to hard error on that HSA version, where users continue using older ones until HSA stopped crawling the address space looking for profiler thing. Have you reproduced the crash this code avoided - I think it would be a specific HSA version via dlopen? I am nervous about (the current, in libomptarget) extensive NFC patches given the limited automated test coverage. This looks like a functional change without any automated test coverage. What manual testing supports the change? |
28482e8
to
b9f9e0f
Compare
This doesn't look dead to me. Before there's a bunch of calls to tgt_register_lib, after there's none. Why do you think this code is dead? |
As promised, I spend some time on this. TL;DR;
Setup: HSA is found at build time so we do not use the dlwrapper. Let's start with the fix:
Verifying it breaks (so we go one commit back):
Yes, it hangs. Observation 1:It goes away and works fine with rocm 5.5.0 and later while rocm 5.4.3 and before breaks. Setup con't: The system has an old
Observation 2:Without this commit (on top of trunk):
It just hangs:
However, this seems unrelated. Observation 3:Replacing
works fine after:
The reason is that ==> Don't mix LLVM versions in one library (with exposed symbols). FWIW, with #74538 this will not happen anymore since we will stop picking up random plugins by default. Observation 4:Is the code dead? Observation 5:With this commit (on top of head):
and the empty I tested all these rocm versions:
AnalysisSo what happened? Let's actually look at the original bug. In the old code (before I started to rewrite libomptarget), we created the
So, we added an uninitialized You can probably imagine this is going to be a problem. What happens next is the HSA "bug" that was mentioned. Remember, the original issue was a hang not a crash. So let's track back when we stopped adding half initialized plugin adaptors into the container. So, let's go back before b80b5f1 and apply this patch. Summary
ConclusionI don't see a reason this code cannot be deleted, regardless of the rocm version used. |
The multiple threads comment was in response to you claiming this code was racy. I said it wasn't, as you can't get here from multiple threads. Nothing to do with the original bug fix. So the original patch hacked around using these objects in a partially initialised state and also sidestepped HSA 5.3ish doing dubious things. The partially initialised state is now gone, so we don't need that. There is a missing piece which is I believe this original problem only occurred using dynamic_hsa, though I can believe other things fell over when linking it normally. Given you've already got this set up, could you toggle to using the dynamically loaded hsa and find out whether 5.3 segv's or not? I'd rather we not keep a workaround in tree for 5.3, given I didn't want to introduce one in the first place, but it seems worth achieving the closure of knowing whether dynamic hsa which does the dubious address space walking would segv after this patch. |
It worked fine when I loaded rocm only after I build LLVM. I almost all of the above on both machines with the same results. The second machine allocation is gone though. |
Good to hear. Thanks! |
The `DelayedBinDesc` was removed as it is not necessary. It was either never used or racy. The only time we could have called it was after we created `PM` in the `init()` constructor. However, we immediately went ahead and called `registerDelayedLibraries` which disabled `DelayedBinDesc` for the future. This was like that since https://reviews.llvm.org/D142249 landed. Also include the reproducer of llvm#60119 which was missing in D142249.
b9f9e0f
to
f861d38
Compare
Reverts #74360 As I wrote in the analysis of #74360: Since bc4e0c0 we will not add PluginAdaptors into the container of all plugin adaptors before the plugin is not ready. The error is thereby gone. When and old HSA loads other libraries they can call register_image but that will simply not register the image with the plugin we are currently initializing. That seems like reasonable behavior, thought it is good to keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We can still have a standalone kernel library though or load it late after all plugins are setup (which seems reasonable). I did not expect one our tests actually doing exactly what this will not allow anymore, at least when you use rocm <5.5.0. Need to figure out if we want this behavior (for rocm <5.5.0).
Remove
DelayedBinDesc
as it is not necessary since bc4e0c0.See #74360 (comment) for details.