[OpenMP] Replace AMDGPU fences with generic scoped fences #119619

jhuber6 · 2024-12-11T21:57:36Z

Summary:
This is simpler and more common. I would've replaced the CUDA uses and
made this the same but currently it doesn't codegen these fences fully
and just emits a full system wide barrier as a fallback.

Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback.

llvmbot · 2024-12-11T21:58:10Z

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
This is simpler and more common. I would've replaced the CUDA uses and
made this the same but currently it doesn't codegen these fences fully
and just emits a full system wide barrier as a fallback.

Full diff: https://github.com/llvm/llvm-project/pull/119619.diff

2 Files Affected:

(modified) offload/DeviceRTL/include/Synchronization.h (+8)
(modified) offload/DeviceRTL/src/Synchronization.cpp (+5-39)

diff --git a/offload/DeviceRTL/include/Synchronization.h b/offload/DeviceRTL/include/Synchronization.h
index 874974cc861df2..7a73f9ba72877a 100644
--- a/offload/DeviceRTL/include/Synchronization.h
+++ b/offload/DeviceRTL/include/Synchronization.h
@@ -26,6 +26,14 @@ enum OrderingTy {
   seq_cst = __ATOMIC_SEQ_CST,
 };
 
+enum ScopeTy {
+  system = __MEMORY_SCOPE_SYSTEM,
+  device_ = __MEMORY_SCOPE_DEVICE,
+  workgroup = __MEMORY_SCOPE_WRKGRP,
+  wavefront = __MEMORY_SCOPE_WVFRNT,
+  single = __MEMORY_SCOPE_SINGLE,
+};
+
 enum MemScopeTy {
   all,    // All threads on all devices
   device, // All threads on the device
diff --git a/offload/DeviceRTL/src/Synchronization.cpp b/offload/DeviceRTL/src/Synchronization.cpp
index 9ea8d171cc830e..3aee23a865d3cf 100644
--- a/offload/DeviceRTL/src/Synchronization.cpp
+++ b/offload/DeviceRTL/src/Synchronization.cpp
@@ -232,50 +232,16 @@ void namedBarrier() {
   fence::team(atomic::release);
 }
 
-// sema checking of amdgcn_fence is aggressive. Intention is to patch clang
-// so that it is usable within a template environment and so that a runtime
-// value of the memory order is expanded to this switch within clang/llvm.
 void fenceTeam(atomic::OrderingTy Ordering) {
-  switch (Ordering) {
-  default:
-    __builtin_unreachable();
-  case atomic::aquire:
-    return __builtin_amdgcn_fence(atomic::aquire, "workgroup");
-  case atomic::release:
-    return __builtin_amdgcn_fence(atomic::release, "workgroup");
-  case atomic::acq_rel:
-    return __builtin_amdgcn_fence(atomic::acq_rel, "workgroup");
-  case atomic::seq_cst:
-    return __builtin_amdgcn_fence(atomic::seq_cst, "workgroup");
-  }
+  return __scoped_atomic_thread_fence(Ordering, atomic::workgroup);
 }
+
 void fenceKernel(atomic::OrderingTy Ordering) {
-  switch (Ordering) {
-  default:
-    __builtin_unreachable();
-  case atomic::aquire:
-    return __builtin_amdgcn_fence(atomic::aquire, "agent");
-  case atomic::release:
-    return __builtin_amdgcn_fence(atomic::release, "agent");
-  case atomic::acq_rel:
-    return __builtin_amdgcn_fence(atomic::acq_rel, "agent");
-  case atomic::seq_cst:
-    return __builtin_amdgcn_fence(atomic::seq_cst, "agent");
-  }
+  return __scoped_atomic_thread_fence(Ordering, atomic::device_);
 }
+
 void fenceSystem(atomic::OrderingTy Ordering) {
-  switch (Ordering) {
-  default:
-    __builtin_unreachable();
-  case atomic::aquire:
-    return __builtin_amdgcn_fence(atomic::aquire, "");
-  case atomic::release:
-    return __builtin_amdgcn_fence(atomic::release, "");
-  case atomic::acq_rel:
-    return __builtin_amdgcn_fence(atomic::acq_rel, "");
-  case atomic::seq_cst:
-    return __builtin_amdgcn_fence(atomic::seq_cst, "");
-  }
+  return __scoped_atomic_thread_fence(Ordering, atomic::system);
 }
 
 void syncWarp(__kmpc_impl_lanemask_t) {

jplehr · 2024-12-11T22:31:30Z

offload/DeviceRTL/include/Synchronization.h

@@ -26,6 +26,14 @@ enum OrderingTy {
  seq_cst = __ATOMIC_SEQ_CST,
 };

+enum ScopeTy {
+  system = __MEMORY_SCOPE_SYSTEM,
+  device_ = __MEMORY_SCOPE_DEVICE,


Is this on purpose with _ to not conflict with something?

Yes, was too lazy to unify it right this moment since I don't want to delete the CUDA handling yet. Maybe someone like @AlexMaclean knows the status of lowering stuff like https://godbolt.org/z/cWs77s9Eq accurately for NVPTX without NVVM.

jplehr · 2024-12-12T13:09:55Z

I ran this through some local testing and did not see any issues with it. No other concerns from my side either. Waiting for another bit and others to jump in.

revert: breaks smoke/xteamr f4ee5a6 [OpenMP] Replace AMDGPU fences with generic scoped fences (llvm#119619) Change-Id: I36a5e0cc20c1820289bc8836dae1c8cfe1fdd275

Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback. Change-Id: I00eb7a789779bce7ab5abc6fa3aedddf4d07ae87

[OpenMP] Replace AMDGPU fences with generic scoped fences

d29efa2

Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback.

jhuber6 requested review from jdoerfert, JonChesterfield, jplehr, saiislam and shiltian December 11, 2024 21:57

llvmbot added the offload label Dec 11, 2024

jplehr reviewed Dec 11, 2024

View reviewed changes

shiltian approved these changes Dec 12, 2024

View reviewed changes

saiislam approved these changes Dec 12, 2024

View reviewed changes

jhuber6 merged commit f4ee5a6 into llvm:main Dec 12, 2024
8 checks passed

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Dec 13, 2024

merge main into amd-staging

283ec83

revert: breaks smoke/xteamr f4ee5a6 [OpenMP] Replace AMDGPU fences with generic scoped fences (llvm#119619) Change-Id: I36a5e0cc20c1820289bc8836dae1c8cfe1fdd275

JonChesterfield mentioned this pull request Dec 16, 2024

[amdgpu] Generic scoped fences are miscompiled #120131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenMP] Replace AMDGPU fences with generic scoped fences #119619

[OpenMP] Replace AMDGPU fences with generic scoped fences #119619

Uh oh!

jhuber6 commented Dec 11, 2024

Uh oh!

llvmbot commented Dec 11, 2024

Uh oh!

jplehr Dec 11, 2024

Uh oh!

jhuber6 Dec 11, 2024

Uh oh!

jplehr commented Dec 12, 2024

Uh oh!

Uh oh!

Uh oh!

[OpenMP] Replace AMDGPU fences with generic scoped fences #119619

[OpenMP] Replace AMDGPU fences with generic scoped fences #119619

Uh oh!

Conversation

jhuber6 commented Dec 11, 2024

Uh oh!

llvmbot commented Dec 11, 2024

Uh oh!

jplehr Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

jplehr commented Dec 12, 2024

Uh oh!

Uh oh!

Uh oh!