[Clang][AMDGPU] Stop defaulting to `one-as` for all atomic scopes #120095

jhuber6 · 2024-12-16T14:59:23Z

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.

llvmbot · 2024-12-16T14:59:57Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.

Patch is 266.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120095.diff

8 Files Affected:

(modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (-7)
(modified) clang/test/CodeGen/scoped-atomic-ops.c (+2311-161)
(modified) clang/test/CodeGen/scoped-fence-ops.c (+10-10)
(modified) clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu (+30-30)
(modified) clang/test/CodeGenCUDA/atomic-ops.cu (+140-140)
(modified) clang/test/CodeGenOpenCL/atomic-ops.cl (+14-14)
(modified) clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl (+8-8)
(modified) clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl (+6-6)

diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 56ad0503a11ab2..ecf044436d8c86 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -537,13 +537,6 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts,
     break;
   }
 
-  if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
-    if (!Name.empty())
-      Name = Twine(Twine(Name) + Twine("-")).str();
-
-    Name = Twine(Twine(Name) + Twine("one-as")).str();
-  }
-
   return Ctx.getOrInsertSyncScopeID(Name);
 }
 
diff --git a/clang/test/CodeGen/scoped-atomic-ops.c b/clang/test/CodeGen/scoped-atomic-ops.c
index cf98812a07e91d..c3162ebe66661f 100644
--- a/clang/test/CodeGen/scoped-atomic-ops.c
+++ b/clang/test/CodeGen/scoped-atomic-ops.c
@@ -5,17 +5,57 @@
 // RUN:   -fvisibility=hidden | FileCheck --check-prefix=SPIRV %s
 
 // AMDGCN-LABEL: define hidden i32 @fi1a(
-// AMDGCN:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread-one-as") monotonic, align 4
-// SPIRV: define hidden spir_func i32 @fi1a(
-// SPIRV:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] monotonic, align 4
-// SPIRV:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("device") monotonic, align 4
-// SPIRV:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[V:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[V_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[V]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP1]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP3:%.*]] = load atomic i32, ptr [[TMP2]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP3]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP5]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP7:%.*]] = load atomic i32, ptr [[TMP6]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP7]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP9]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP10:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    ret i32 [[TMP10]]
+//
+// SPIRV-LABEL: define hidden spir_func i32 @fi1a(
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+// SPIRV-NEXT:  [[ENTRY:.*:]]
+// SPIRV-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT:    [[V:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP1]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP3:%.*]] = load atomic i32, ptr [[TMP2]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP3]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP5]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP7:%.*]] = load atomic i32, ptr [[TMP6]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP7]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP9]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP10:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    ret i32 [[TMP10]]
+//
 int fi1a(int *i) {
   int v;
   __scoped_atomic_load(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -27,17 +67,101 @@ int fi1a(int *i) {
 }
 
 // AMDGCN-LABEL: define hidden i32 @fi1b(
-// AMDGCN:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP2:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP3:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP4:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP1]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP2]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP3_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP3]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP4_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP4]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP1]], ptr [[ATOMIC_TEMP_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP2]], ptr [[TMP3]], align 4
+// AMDGCN-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP5]], ptr [[ATOMIC_TEMP1_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP6:%.*]] = load i32, ptr [[ATOMIC_TEMP1_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP6]], ptr [[TMP7]], align 4
+// AMDGCN-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP9]], ptr [[ATOMIC_TEMP2_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP10:%.*]] = load i32, ptr [[ATOMIC_TEMP2_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP11:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP10]], ptr [[TMP11]], align 4
+// AMDGCN-NEXT:    [[TMP12:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP13:%.*]] = load atomic i32, ptr [[TMP12]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP13]], ptr [[ATOMIC_TEMP3_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP14:%.*]] = load i32, ptr [[ATOMIC_TEMP3_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP15:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP14]], ptr [[TMP15]], align 4
+// AMDGCN-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP17:%.*]] = load atomic i32, ptr [[TMP16]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP17]], ptr [[ATOMIC_TEMP4_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP18:%.*]] = load i32, ptr [[ATOMIC_TEMP4_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP19:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP18]], ptr [[TMP19]], align 4
+// AMDGCN-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
+// AMDGCN-NEXT:    ret i32 [[TMP21]]
+//
 // SPIRV-LABEL: define hidden spir_func i32 @fi1b(
-// SPIRV:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] monotonic, align 4
-// SPIRV:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
-// SPIRV:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// SPIRV-NEXT:  [[ENTRY:.*:]]
+// SPIRV-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP2:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP3:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP4:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP1]], ptr [[ATOMIC_TEMP]], align 4
+// SPIRV-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP]], align 4
+// SPIRV-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP2]], ptr [[TMP3]], align 4
+// SPIRV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 4
+// SPIRV-NEXT:    [[TMP6:%.*]] = load i32, ptr [[ATOMIC_TEMP1]], align 4
+// SPIRV-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP6]], ptr [[TMP7]], align 4
+// SPIRV-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP9]], ptr [[ATOMIC_TEMP2]], align 4
+// SPIRV-NEXT:    [[TMP10:%.*]] = load i32, ptr [[ATOMIC_TEMP2]], align 4
+// SPIRV-NEXT:    [[TMP11:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP10]], ptr [[TMP11]], align 4
+// SPIRV-NEXT:    [[TMP12:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP13:%.*]] = load atomic i32, ptr [[TMP12]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP13]], ptr [[ATOMIC_TEMP3]], align 4
+// SPIRV-NEXT:    [[TMP14:%.*]] = load i32, ptr [[ATOMIC_TEMP3]], align 4
+// SPIRV-NEXT:    [[TMP15:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP14]], ptr [[TMP15]], align 4
+// SPIRV-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP17:%.*]] = load atomic i32, ptr [[TMP16]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP17]], ptr [[ATOMIC_TEMP4]], align 4
+// SPIRV-NEXT:    [[TMP18:%.*]] = load i32, ptr [[ATOMIC_TEMP4]], align 4
+// SPIRV-NEXT:    [[TMP19:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP18]], ptr [[TMP19]], align 4
+// SPIRV-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
+// SPIRV-NEXT:    ret i32 [[TMP21]]
+//
 int fi1b(int *i) {
   *i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
   *i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
@@ -48,17 +172,55 @@ int fi1b(int *i) {
 }
 
 // AMDGCN-LABEL: define hidden void @fi2a(
-// AMDGCN:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[V:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[V_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[V]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 1, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP1:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP1]], ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP3:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP3]], ptr [[TMP2]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP5:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP5]], ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP7:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP7]], ptr [[TMP6]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP9:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP9]], ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT:    ret void
+//
 // SPIRV-LABEL: define hidden spir_func void @fi2a(
-// SPIRV:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// SPIRV-NEXT:  [[ENTRY:.*:]]
+// SPIRV-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT:    [[V:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 1, ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP1:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP1]], ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP3:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP3]], ptr [[TMP2]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP5:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP5]], ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP7:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP7]], ptr [[TMP6]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP9:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP9]], ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT:    ret void
+//
 void fi2a(int *i) {
   int v = 1;
   __scoped_atomic_store(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -69,17 +231,75 @@ void fi2a(int *i) {
 }
 
 // AMDGCN-LABEL: define hidden void @fi2b(
-// AMDGCN:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP1:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP2:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP3:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP4:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP1]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP2]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP3_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP3]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP4_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP4]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 1, ptr [[DOTATOMICTMP_ASCAST]], align 4
+/...
[truncated]

JonChesterfield · 2024-12-16T15:04:07Z

I would say this change is obviously correct, but I can't see why it was introduced and vaguely fear tripping over abhorrent behaviour in the backend. Can you send this down the internal CI pipeline to pick up some more runtime testing (unless amd-stg-open is already defaulting to the right thing?)

JonChesterfield

Explicitly marking green, even if this commit upsets something else in the backend having a concurrency primitive default to racy is clearly bad.

JonChesterfield · 2024-12-16T15:37:04Z

@b-sumner has useful context on this. I'll try not to speak for him but it sounds like the block deleted here has the right semantics for opencl, where "seqcst" has some special meaning and generally the semantics don't totally make sense to me. Suggest we amend this to "if opencl, do this thing, otherwise leave it alone"

(can't see how to remove the green tick from the review)

arsenm

This one-as business seems like it's cruft from before MMRAs. Can we rip them out and replace them with MMRAs for OpenCL?

arsenm · 2024-12-16T15:57:15Z

@b-sumner has useful context on this. I'll try not to speak for him but it sounds like the block deleted here has the right semantics for opencl, where "seqcst" has some special meaning and generally the semantics don't totally make sense to me. Suggest we amend this to "if opencl, do this thing, otherwise leave it alone"

I'd rather not infect this with poorly understood language based logic. The one-as should be more aggressive and safe to ignore.

(can't see how to remove the green tick from the review)

"Request changes"

jhuber6 · 2024-12-16T15:58:17Z

If that's the case then we should be able to use the LangOpts to change it.

JonChesterfield

"You need to leave a comment indicating the requested changes."

JonChesterfield · 2024-12-16T16:14:26Z

This one-as business seems like it's cruft from before MMRAs. Can we rip them out and replace them with MMRAs for OpenCL?

https://llvm.org/docs/MemoryModelRelaxationAnnotations.html calls out the opencl fence as a motivating example which suggests either yes, or we should amend the MMRA system to be able to express it. Sounds like a much cleaner fix to me, but also open to unblocking openmp with an "if (opencl) {}" clause in the meantime.

arsenm · 2024-12-17T01:05:38Z

clang/lib/CodeGen/Targets/AMDGPU.cpp

@@ -537,7 +537,8 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts,
    break;
  }

-  if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
+  if (LangOpts.OpenCL &&


I am deeply opposed to making any __builtin_amdgcn intrinsic behavior language dependent. This is a target builtin with target defined behavior that should behave consistently regardless of language mode. I'd rather just take an OpenCL performance regression than this

If OpenCL wants this funny behavior, they should need to use a different builtin. That said, is OpenCL even using the generic builtin?

This is not an AMDGPU builitin, the AMDGPU builtin takes a string that corresponds directly to the syncscope string. This is for the 'generic' versions which was reimplemented like three times for OpenCL, HIP, and the GNU / scoped versions. I think that a better solution here would be to only set it if it uses the OpenCL flavored scope arguments, then HIP and C/C++ get the non one-as implementation, as those targets don't explicitly use the same AS knowledge.

My point is 1 builtin : 1 behavior, universally. No modality.

The amdgcn fence builtin also looks like it already takes optional arguments to append MMRAs. If OpenCL is actually using this builtin anywhere (which I doubt it is), it can easily migrate to __builtin_amdgcn_fence

That's the current behavior now that I updated it.

I'm sure it's using it somewhere since it's the OpenCL spec I think? This currently miscompiles on C/C++ w/ OpenMP so I think it should be fixed. Possible performance regressions but I don't think it's a valid optimization on non-OpenCL targets since we don't explicitly expose the address spaces as a part of the language.

If we don't have tests running _scoped_atomic* through OpenCL, we should have them

dhruvachak · 2024-12-17T01:32:24Z

clang/lib/CodeGen/Targets/AMDGPU.cpp

-  if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
+  // OpenCL assumes by default that atomic scopes are per-address space for
+  // non-sequentially consistent operations.
+  if (Scope >= SyncScope::OpenCLWorkGroup &&


Can you explain which scopes will generate the one-as versions with this change? I don't know which ones are in that range: agent, workgroup, wavefront, singlethread are the possible candidates.

I also don't understand why the one-as should be generated for a specific range of scopes. How do we know that will not lead to mis-compilation?

OpenCL ones, this is the range of OpenCL builtins via the opencl_atomic stuff.

Ok, so here's the enum.

enum class SyncScope { SystemScope, DeviceScope, WorkgroupScope, WavefrontScope, SingleScope, HIPSingleThread, HIPWavefront, HIPWorkgroup, HIPAgent, HIPSystem, OpenCLWorkGroup, OpenCLDevice, OpenCLAllSVMDevices, OpenCLSubGroup, Last = OpenCLSubGroup };

So like you said, the OpenCL ones. If Matt agrees, I am fine with this change though I don't see how it is better than explicitly checking for OpenCL. Just more cryptic. Either way, please add a comment.

There's already a comment though?

It's better because it's up to the OpenCL builtin, you can still use the 'standard' builtins in OpenCL which would then otherwise give them different behavior depending on the language.

What does seem questionable is that this function is checking the memory ordering. It feels like that should be done by the caller of this function. It is the caller that should be selecting the right value for the scope argument. For OpenCL the *one-as" sync-scopes should only be used for non-sequentially consistent memory orders. That is a rule of the OpenCL language.

You can use the OpenCL builtins outisde of OpenCL https://godbolt.org/z/Ex4aPMx3v. This function is called by all the various reimplementations of this, but with different enum values for the scope. So, if we want the __opencl_ versions to maintain this behavior, we can do that.

I guess I am still feeling I am missing something. This PR is making a change to this function. Nothing else. So it seems the question is if the change is correct. This function should only act on the arguments it is given and faithfully do the right thing. It seems that is what it is doing, except for the check of the memory ordering. That check should be in the callers of the function.

Sounds like the callers of this function are the builtins that you describe. The builtins that are intended to implement OpenCL semantics should be sure to pass the O{PENCL* scope values, and should check the memory ordering for being sequentially consistent. The non-OpenCL builtins should pass in the appropriate non-OPENCL values for scope. Is that what is happening?

Yes, each non-OpenCL function goes through this common interface. The issue is that for non-OpenCL users they all get the one-as feature added, which is not correct in the general case. This patch is an attempt to maintain the OpenCL behavior while the other users of this function will not longer have one-as on them.

But with this change that is no longer the case. So is there any push back on this fix?

b-sumner · 2024-12-17T02:23:49Z

OpenCL should only be using the amdgcn fence builtin. See, e.g. https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/opencl/src/misc/awif.cl#L73

JonChesterfield · 2024-12-17T14:53:54Z

Stringing the pieces together, we may have been conflating opencl-the-language with opencl-the-implementation.

Let's go with the first line of attack: no language special casing here, no checking seq-cst and appending one-as.

Opencl the implementation won't care because it's using the amdgcn builtin anyway (and I think previously just used IR).

Opencl the language doesn't appear to document what the clang builtin is expected to do on it so it seems to me that it can do whatever we see fit. People writing opencl that happen to be using that intrinsic will see a minor performance degradation that they can work around by changing intrinsic. I anticipate that there are few to no such people, not least because this language-independent clang intrinsic was relatively recently introduced.

Simpler compiler, simpler semantics, unbreaks openmp, doesn't require reworking opencl to use the new metadata. I.e. we ship as initially proposed.

@b-sumner is that acceptable?

(opened #120131 to track this, though it hasn't attracted discussion)

Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.

b-sumner · 2024-12-17T15:22:32Z

@JonChesterfield I think I agree with you. And OpenCL source should not be calling builtins to begin with since they're not officially part of the OpenCL language.

jhuber6 · 2025-01-03T14:12:40Z

Ping

jhuber6 · 2025-01-06T13:48:28Z

I discussed this with @t-tye, he said it's correct but wants @Pierre-vh to sign off on it.

Pierre-vh · 2025-01-06T14:04:15Z

I'm trying to understand this. Is the function being changed a generic util called by multiple builtins, and this change is just to make one-as exclusive to the OpenCL variant of those builtins ?
Can an identical builtin have different behavior depending on the input language (e.g. OpenCL or C++ mode) ?

jhuber6 · 2025-01-06T14:05:57Z

I'm trying to understand this. Is the function being changed a generic util called by multiple builtins, and this change is just to make one-as exclusive to the OpenCL variant of those builtins ? Can an identical builtin have different behavior depending on the input language (e.g. OpenCL or C++ mode) ?

one-as is an OpenCL specific weakening of the atomic memory model. We want to preserve this on OpenCL's builtins but remove it for other atomic codegen. This patch changes the behavior to only add one-as if the user called the __opencl_atomic builtin, regardless of language.

Pierre-vh

Ah, that makes sense. It's surprising it wasn't like that in the first place

arsenm · 2025-01-06T14:14:33Z

Ah, that makes sense. It's surprising it wasn't like that in the first place

But can we now get rid of the -one-as sync scopes and replace them with MMRAs?

llvm-ci · 2025-01-06T14:30:22Z

LLVM Buildbot has detected a new failure on builder clang-debian-cpp20 running on clang-debian-cpp20 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/7783

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
RUN: at line 1: rm -rf /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp && mkdir -p /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp
+ rm -rf /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp
+ mkdir -p /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp
RUN: at line 2: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-mc --triple=loongarch64 --filetype=obj -o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-mc --triple=loongarch64 --filetype=obj -o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s
RUN: at line 3: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec               --abs external_data=0xdeadbeef               --abs external_func=0xcafef00d               --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec --abs external_data=0xdeadbeef --abs external_func=0xcafef00d --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o
llvm-jitlink: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:285: llvm::orc::SymbolStringPool::~SymbolStringPool(): Assertion `Pool.empty() && "Dangling references at pool destruction time"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec --abs external_data=0xdeadbeef --abs external_func=0xcafef00d --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o
 #0 0x00005b9d32f11bb8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xeb5bb8)
 #1 0x00005b9d32f0f6ad llvm::sys::RunSignalHandlers() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xeb36ad)
 #2 0x00005b9d32f12158 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007e333801d510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007e333806b0fc (/lib/x86_64-linux-gnu/libc.so.6+0x8a0fc)
 #5 0x00007e333801d472 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3c472)
 #6 0x00007e33380074b2 abort (/lib/x86_64-linux-gnu/libc.so.6+0x264b2)
 #7 0x00007e33380073d5 (/lib/x86_64-linux-gnu/libc.so.6+0x263d5)
 #8 0x00007e33380163a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x00005b9d327aaf0c (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x74ef0c)
#10 0x00005b9d32df4b57 llvm::orc::ExecutorProcessControl::~ExecutorProcessControl() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xd98b57)
#11 0x00005b9d32df629f llvm::orc::SelfExecutorProcessControl::~SelfExecutorProcessControl() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xd9a29f)
#12 0x00005b9d32d27e88 llvm::orc::ExecutionSession::~ExecutionSession() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xccbe88)
#13 0x00005b9d32783edd llvm::Session::~Session() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x727edd)
#14 0x00005b9d3278e664 main (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x732664)
#15 0x00007e33380086ca (/lib/x86_64-linux-gnu/libc.so.6+0x276ca)
#16 0x00007e3338008785 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27785)
#17 0x00005b9d3277c671 _start (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x720671)
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.script: line 3: 3099096 Aborted                 /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec --abs external_data=0xdeadbeef --abs external_func=0xcafef00d --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o

--

********************

Pierre-vh · 2025-01-06T14:32:46Z

Ah, that makes sense. It's surprising it wasn't like that in the first place

But can we now get rid of the -one-as sync scopes and replace them with MMRAs?

MMRAs are droppable so not currently suited to replace syncscopes like that, but it's a goal.

As part of the Vulkan memory model work I need to reevaluate that aspect of MMRAs and fix it (= promote them to an instruction operand), or use another system altogether (in which case MMRAs may just get removed). It's not yet decided which one of those 2 options it'll be yet, but I plan to regularly follow-up on this from now on until it's addressed so hopefully something will happen medium term.

I'm not sure how much resistance there'd be upstream to making MMRAs an instruction operand instead of MD?

…vm#120095) Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.

…vm#120095) (llvm#1536)

jhuber6 requested review from arsenm, b-sumner, dhruvachak, JonChesterfield, kzhuravl and shiltian December 16, 2024 14:59

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. labels Dec 16, 2024

JonChesterfield approved these changes Dec 16, 2024

View reviewed changes

JonChesterfield requested review from t-tye and Pierre-vh December 16, 2024 15:30

arsenm reviewed Dec 16, 2024

View reviewed changes

JonChesterfield requested changes Dec 16, 2024

View reviewed changes

jhuber6 force-pushed the OneAs branch from 0027fd3 to 5a46854 Compare December 16, 2024 16:25

JonChesterfield mentioned this pull request Dec 16, 2024

[amdgpu] Generic scoped fences are miscompiled #120131

Open

arsenm reviewed Dec 17, 2024

View reviewed changes

jhuber6 force-pushed the OneAs branch from 615ce3d to c4b2b16 Compare December 17, 2024 01:12

dhruvachak reviewed Dec 17, 2024

View reviewed changes

jhuber6 added 3 commits December 17, 2024 09:01

OpenCL only

cf2aa30

Make test smaller

bd723c2

Use the scope instead of the language

04fccb2

add OpenCL test

6181ebc

jhuber6 force-pushed the OneAs branch from c4b2b16 to 6181ebc Compare December 17, 2024 15:34

Pierre-vh approved these changes Jan 6, 2025

View reviewed changes

jhuber6 merged commit 81fae0d into llvm:main Jan 6, 2025
8 checks passed

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 18, 2025

[Clang][AMDGPU] Stop defaulting to one-as for all atomic scopes (ll…

bd802d4

…vm#120095) (llvm#1536)

[Clang][AMDGPU] Stop defaulting to one-as for all atomic scopes #120095

[Clang][AMDGPU] Stop defaulting to one-as for all atomic scopes #120095

Uh oh!

Conversation

jhuber6 commented Dec 16, 2024

Uh oh!

llvmbot commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JonChesterfield commented Dec 16, 2024

Uh oh!

JonChesterfield left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonChesterfield commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Dec 16, 2024

Uh oh!

jhuber6 commented Dec 16, 2024

Uh oh!

JonChesterfield left a comment

Choose a reason for hiding this comment

Uh oh!

JonChesterfield commented Dec 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruvachak Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-tye Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

b-sumner commented Dec 17, 2024

Uh oh!

JonChesterfield commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-sumner commented Dec 17, 2024

[Clang][AMDGPU] Stop defaulting to `one-as` for all atomic scopes #120095

[Clang][AMDGPU] Stop defaulting to `one-as` for all atomic scopes #120095

llvmbot commented Dec 16, 2024 •

edited

Loading

JonChesterfield left a comment •

edited

Loading

JonChesterfield commented Dec 16, 2024 •

edited

Loading

jhuber6 Dec 17, 2024 •

edited

Loading

dhruvachak Dec 17, 2024 •

edited

Loading

jhuber6 Dec 17, 2024 •

edited

Loading

t-tye Jan 3, 2025 •

edited

Loading

JonChesterfield commented Dec 17, 2024 •

edited

Loading