-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[OpenMP] Replace AMDGPU fences with generic scoped fences #119619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback.
@llvm/pr-subscribers-offload Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/119619.diff 2 Files Affected:
diff --git a/offload/DeviceRTL/include/Synchronization.h b/offload/DeviceRTL/include/Synchronization.h
index 874974cc861df2..7a73f9ba72877a 100644
--- a/offload/DeviceRTL/include/Synchronization.h
+++ b/offload/DeviceRTL/include/Synchronization.h
@@ -26,6 +26,14 @@ enum OrderingTy {
seq_cst = __ATOMIC_SEQ_CST,
};
+enum ScopeTy {
+ system = __MEMORY_SCOPE_SYSTEM,
+ device_ = __MEMORY_SCOPE_DEVICE,
+ workgroup = __MEMORY_SCOPE_WRKGRP,
+ wavefront = __MEMORY_SCOPE_WVFRNT,
+ single = __MEMORY_SCOPE_SINGLE,
+};
+
enum MemScopeTy {
all, // All threads on all devices
device, // All threads on the device
diff --git a/offload/DeviceRTL/src/Synchronization.cpp b/offload/DeviceRTL/src/Synchronization.cpp
index 9ea8d171cc830e..3aee23a865d3cf 100644
--- a/offload/DeviceRTL/src/Synchronization.cpp
+++ b/offload/DeviceRTL/src/Synchronization.cpp
@@ -232,50 +232,16 @@ void namedBarrier() {
fence::team(atomic::release);
}
-// sema checking of amdgcn_fence is aggressive. Intention is to patch clang
-// so that it is usable within a template environment and so that a runtime
-// value of the memory order is expanded to this switch within clang/llvm.
void fenceTeam(atomic::OrderingTy Ordering) {
- switch (Ordering) {
- default:
- __builtin_unreachable();
- case atomic::aquire:
- return __builtin_amdgcn_fence(atomic::aquire, "workgroup");
- case atomic::release:
- return __builtin_amdgcn_fence(atomic::release, "workgroup");
- case atomic::acq_rel:
- return __builtin_amdgcn_fence(atomic::acq_rel, "workgroup");
- case atomic::seq_cst:
- return __builtin_amdgcn_fence(atomic::seq_cst, "workgroup");
- }
+ return __scoped_atomic_thread_fence(Ordering, atomic::workgroup);
}
+
void fenceKernel(atomic::OrderingTy Ordering) {
- switch (Ordering) {
- default:
- __builtin_unreachable();
- case atomic::aquire:
- return __builtin_amdgcn_fence(atomic::aquire, "agent");
- case atomic::release:
- return __builtin_amdgcn_fence(atomic::release, "agent");
- case atomic::acq_rel:
- return __builtin_amdgcn_fence(atomic::acq_rel, "agent");
- case atomic::seq_cst:
- return __builtin_amdgcn_fence(atomic::seq_cst, "agent");
- }
+ return __scoped_atomic_thread_fence(Ordering, atomic::device_);
}
+
void fenceSystem(atomic::OrderingTy Ordering) {
- switch (Ordering) {
- default:
- __builtin_unreachable();
- case atomic::aquire:
- return __builtin_amdgcn_fence(atomic::aquire, "");
- case atomic::release:
- return __builtin_amdgcn_fence(atomic::release, "");
- case atomic::acq_rel:
- return __builtin_amdgcn_fence(atomic::acq_rel, "");
- case atomic::seq_cst:
- return __builtin_amdgcn_fence(atomic::seq_cst, "");
- }
+ return __scoped_atomic_thread_fence(Ordering, atomic::system);
}
void syncWarp(__kmpc_impl_lanemask_t) {
|
@@ -26,6 +26,14 @@ enum OrderingTy { | |||
seq_cst = __ATOMIC_SEQ_CST, | |||
}; | |||
|
|||
enum ScopeTy { | |||
system = __MEMORY_SCOPE_SYSTEM, | |||
device_ = __MEMORY_SCOPE_DEVICE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this on purpose with _
to not conflict with something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was too lazy to unify it right this moment since I don't want to delete the CUDA handling yet. Maybe someone like @AlexMaclean knows the status of lowering stuff like https://godbolt.org/z/cWs77s9Eq accurately for NVPTX without NVVM.
I ran this through some local testing and did not see any issues with it. No other concerns from my side either. Waiting for another bit and others to jump in. |
revert: breaks smoke/xteamr f4ee5a6 [OpenMP] Replace AMDGPU fences with generic scoped fences (llvm#119619) Change-Id: I36a5e0cc20c1820289bc8836dae1c8cfe1fdd275
Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback. Change-Id: I00eb7a789779bce7ab5abc6fa3aedddf4d07ae87
Summary:
This is simpler and more common. I would've replaced the CUDA uses and
made this the same but currently it doesn't codegen these fences fully
and just emits a full system wide barrier as a fallback.