Skip to content

[Libomptarget] Remove global ctor and use reference counting #80499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 22, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Feb 2, 2024

Summary:
Currently we rely on global constructors to initialize and shut down the
OpenMP runtime library and plugin manager. This causes some issues
because we do not have a defined lifetime that we can rely on to release
and allocate resources. This patch instead adds some simple reference
counted initialization and deinitialization function.

A future patch will use the deinit interface to more intelligently
handle plugin deinitilization. Right now we do nothing and rely on
atexit inside of the plugins to tear them down. This isn't great
because it limits our ability to control these things.

Note that I made the __tgt_register_lib functions do the
initialization instead of adding calls to the new runtime functions in
the linker wrapper. The reason for this is because in the past it's been
easier to not introduce a new function call, since sometimes the user's
compiler will link against an older libomptarget. Maybe if we change
the name with offloading in the future we can simplify this.

Depends on #80460

@@ -20,25 +20,37 @@
extern void llvm::omp::target::ompt::connectLibrary();
#endif

__attribute__((constructor(101))) void init() {
static std::mutex PluginMtx;
static std::atomic<uint32_t> RefCount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this atomic, it seems all accesses are guarded by PluginMtx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that myself, was erring on the side of caution in case someone somewhere wants to read it, but likely a non-issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather put the counter into PM, and wrap PM into ProtectedObj? That makes it clear. I dislike complexity that is not needed, but I get why you went here. Still, this way one thinks the RefCount is accessed concurrently. ProtectedObj has benefits over a standalone Mtx too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems more complicated to me, since we'd need to somehow put the reference count into it, whereas as far as I'm aware the protected object simply wraps a mutex around a single object.

delete PM;
void deinitRuntime() {
std::scoped_lock<decltype(PluginMtx)> Lock(PluginMtx);
if (PM == nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a bad idea as it might hide problems. assert(PM) rather. Maybe we should not even do PM as a pointer? just init/deinit on an object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, honestly it's probably correct to just assert that deinit should never be called without initializing it first, rather than handling it. We expect these things to be paired up, so we should assert as such.


// CHECK: PASS
printf("PASS\n");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is not perfect but I'm unsure how to improve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add a test that dlopens a bunch of different libraries in parallel or something. Realistically this needs to be a unit test, but we're missing a bit of infrastructure right now. Figured that would come later.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 16, 2024

ping

@JonChesterfield
Copy link
Collaborator

It's weird to have a mutex and an atomic reference count. Suggest we drop the mutex and just reference count it. fetch_add or cas into it and finding a zero says you're the first thread to get there.

I think std::mutex has a constructor, in which case you're replacing a global ctor with a global ctor.

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refcount it correctly and drop the mutex, much simpler (in terms of runtime behaviour) and removes failure modes (in terms of runtime behaviour)

edit: ^ voted down on that in favour of more complicated runtime behaviour in exchange for more familiar programming constructs

@@ -48,7 +48,8 @@ void deinitRuntime() {
std::scoped_lock<decltype(PluginMtx)> Lock(PluginMtx);
assert(PM && "Runtime not initialized");

if (RefCount-- == 0) {
RefCount--;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong - you're separating the decrement from the load, should be a fetch_sub followed by a branch on the old value

@JonChesterfield
Copy link
Collaborator

I've been persuaded that the rest of the team thinks mutex is much simpler than atomics so we should go with that.

Copy link
Contributor

@dhruvachak dhruvachak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The init/deinit code should not be performance critical, so i think having a mutex is ok. Now that RefCount is not atomic, there should not be any confusion about any future usage, it would have to be protected by the mutex.

In addition, I did some testing with OMPT and this patch since OMPT has some ordering constraints. That appeared ok too.

@jhuber6 jhuber6 force-pushed the Refcount branch 3 times, most recently from 4547c7d to ec4a76a Compare February 22, 2024 17:43
Copy link

github-actions bot commented Feb 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Summary:
Currently we rely on global constructors to initialize and shut down the
OpenMP runtime library and plugin manager. This causes some issues
because we do not have a defined lifetime that we can rely on to release
and allocate resources. This patch instead adds some simple reference
counted initialization and deinitialization function.

A future patch will use the `deinit` interface to more intelligently
handle plugin deinitilization. Right now we do nothing and rely on
`atexit` inside of the plugins to tear them down. This isn't great
because it limits our ability to control these things.

Note that I made the `__tgt_register_lib` functions do the
initialization instead of adding calls to the new runtime functions in
the linker wrapper. The reason for this is because in the past it's been
easier to not introduce a new function call, since sometimes the user's
compiler will link against an older `libomptarget`. Maybe if we change
the name with offloading in the future we can simplify this.

Depends on llvm#80460

Fix decrement
@jhuber6 jhuber6 merged commit ea174c0 into llvm:main Feb 22, 2024
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 26, 2024
)

Summary:
Currently we rely on global constructors to initialize and shut down the
OpenMP runtime library and plugin manager. This causes some issues
because we do not have a defined lifetime that we can rely on to release
and allocate resources. This patch instead adds some simple reference
counted initialization and deinitialization function.

A future patch will use the `deinit` interface to more intelligently
handle plugin deinitilization. Right now we do nothing and rely on
`atexit` inside of the plugins to tear them down. This isn't great
because it limits our ability to control these things.

Note that I made the `__tgt_register_lib` functions do the
initialization instead of adding calls to the new runtime functions in
the linker wrapper. The reason for this is because in the past it's been
easier to not introduce a new function call, since sometimes the user's
compiler will link against an older `libomptarget`. Maybe if we change
the name with offloading in the future we can simplify this.

Depends on llvm#80460

Change-Id: I70815457fab9b5d68db8e48b3b5e1c75951c05f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants