Skip to content

[Clang][AMDGPU] Stop defaulting to one-as for all atomic scopes #120095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 6, 2025

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Dec 16, 2024

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. labels Dec 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2024

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.


Patch is 266.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120095.diff

8 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (-7)
  • (modified) clang/test/CodeGen/scoped-atomic-ops.c (+2311-161)
  • (modified) clang/test/CodeGen/scoped-fence-ops.c (+10-10)
  • (modified) clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu (+30-30)
  • (modified) clang/test/CodeGenCUDA/atomic-ops.cu (+140-140)
  • (modified) clang/test/CodeGenOpenCL/atomic-ops.cl (+14-14)
  • (modified) clang/test/CodeGenOpenCL/atomics-cas-remarks-gfx90a.cl (+8-8)
  • (modified) clang/test/CodeGenOpenCL/atomics-unsafe-hw-remarks-gfx90a.cl (+6-6)
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 56ad0503a11ab2..ecf044436d8c86 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -537,13 +537,6 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts,
     break;
   }
 
-  if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
-    if (!Name.empty())
-      Name = Twine(Twine(Name) + Twine("-")).str();
-
-    Name = Twine(Twine(Name) + Twine("one-as")).str();
-  }
-
   return Ctx.getOrInsertSyncScopeID(Name);
 }
 
diff --git a/clang/test/CodeGen/scoped-atomic-ops.c b/clang/test/CodeGen/scoped-atomic-ops.c
index cf98812a07e91d..c3162ebe66661f 100644
--- a/clang/test/CodeGen/scoped-atomic-ops.c
+++ b/clang/test/CodeGen/scoped-atomic-ops.c
@@ -5,17 +5,57 @@
 // RUN:   -fvisibility=hidden | FileCheck --check-prefix=SPIRV %s
 
 // AMDGCN-LABEL: define hidden i32 @fi1a(
-// AMDGCN:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread-one-as") monotonic, align 4
-// SPIRV: define hidden spir_func i32 @fi1a(
-// SPIRV:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] monotonic, align 4
-// SPIRV:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("device") monotonic, align 4
-// SPIRV:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[V:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[V_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[V]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP1]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP3:%.*]] = load atomic i32, ptr [[TMP2]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP3]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP5]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP7:%.*]] = load atomic i32, ptr [[TMP6]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP7]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP9]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP10:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    ret i32 [[TMP10]]
+//
+// SPIRV-LABEL: define hidden spir_func i32 @fi1a(
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+// SPIRV-NEXT:  [[ENTRY:.*:]]
+// SPIRV-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT:    [[V:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP1]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP3:%.*]] = load atomic i32, ptr [[TMP2]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP3]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP5]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP7:%.*]] = load atomic i32, ptr [[TMP6]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP7]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP9]], ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP10:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    ret i32 [[TMP10]]
+//
 int fi1a(int *i) {
   int v;
   __scoped_atomic_load(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -27,17 +67,101 @@ int fi1a(int *i) {
 }
 
 // AMDGCN-LABEL: define hidden i32 @fi1b(
-// AMDGCN:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP2:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP3:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[ATOMIC_TEMP4:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP1]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP2]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP3_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP3]] to ptr
+// AMDGCN-NEXT:    [[ATOMIC_TEMP4_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP4]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP1]], ptr [[ATOMIC_TEMP_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP2]], ptr [[TMP3]], align 4
+// AMDGCN-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP5]], ptr [[ATOMIC_TEMP1_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP6:%.*]] = load i32, ptr [[ATOMIC_TEMP1_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP6]], ptr [[TMP7]], align 4
+// AMDGCN-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP9]], ptr [[ATOMIC_TEMP2_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP10:%.*]] = load i32, ptr [[ATOMIC_TEMP2_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP11:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP10]], ptr [[TMP11]], align 4
+// AMDGCN-NEXT:    [[TMP12:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP13:%.*]] = load atomic i32, ptr [[TMP12]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP13]], ptr [[ATOMIC_TEMP3_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP14:%.*]] = load i32, ptr [[ATOMIC_TEMP3_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP15:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP14]], ptr [[TMP15]], align 4
+// AMDGCN-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP17:%.*]] = load atomic i32, ptr [[TMP16]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT:    store i32 [[TMP17]], ptr [[ATOMIC_TEMP4_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP18:%.*]] = load i32, ptr [[ATOMIC_TEMP4_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP19:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 [[TMP18]], ptr [[TMP19]], align 4
+// AMDGCN-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
+// AMDGCN-NEXT:    ret i32 [[TMP21]]
+//
 // SPIRV-LABEL: define hidden spir_func i32 @fi1b(
-// SPIRV:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] monotonic, align 4
-// SPIRV:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
-// SPIRV:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// SPIRV-NEXT:  [[ENTRY:.*:]]
+// SPIRV-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT:    [[ATOMIC_TEMP:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP1:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP2:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP3:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    [[ATOMIC_TEMP4:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP1]], ptr [[ATOMIC_TEMP]], align 4
+// SPIRV-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP]], align 4
+// SPIRV-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP2]], ptr [[TMP3]], align 4
+// SPIRV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 4
+// SPIRV-NEXT:    [[TMP6:%.*]] = load i32, ptr [[ATOMIC_TEMP1]], align 4
+// SPIRV-NEXT:    [[TMP7:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP6]], ptr [[TMP7]], align 4
+// SPIRV-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP9]], ptr [[ATOMIC_TEMP2]], align 4
+// SPIRV-NEXT:    [[TMP10:%.*]] = load i32, ptr [[ATOMIC_TEMP2]], align 4
+// SPIRV-NEXT:    [[TMP11:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP10]], ptr [[TMP11]], align 4
+// SPIRV-NEXT:    [[TMP12:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP13:%.*]] = load atomic i32, ptr [[TMP12]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP13]], ptr [[ATOMIC_TEMP3]], align 4
+// SPIRV-NEXT:    [[TMP14:%.*]] = load i32, ptr [[ATOMIC_TEMP3]], align 4
+// SPIRV-NEXT:    [[TMP15:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP14]], ptr [[TMP15]], align 4
+// SPIRV-NEXT:    [[TMP16:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP17:%.*]] = load atomic i32, ptr [[TMP16]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT:    store i32 [[TMP17]], ptr [[ATOMIC_TEMP4]], align 4
+// SPIRV-NEXT:    [[TMP18:%.*]] = load i32, ptr [[ATOMIC_TEMP4]], align 4
+// SPIRV-NEXT:    [[TMP19:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 [[TMP18]], ptr [[TMP19]], align 4
+// SPIRV-NEXT:    [[TMP20:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
+// SPIRV-NEXT:    ret i32 [[TMP21]]
+//
 int fi1b(int *i) {
   *i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
   *i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
@@ -48,17 +172,55 @@ int fi1b(int *i) {
 }
 
 // AMDGCN-LABEL: define hidden void @fi2a(
-// AMDGCN:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[V:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[V_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[V]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 1, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP1:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP1]], ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP3:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP3]], ptr [[TMP2]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP5:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP5]], ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP7:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP7]], ptr [[TMP6]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP9:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT:    store atomic i32 [[TMP9]], ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT:    ret void
+//
 // SPIRV-LABEL: define hidden spir_func void @fi2a(
-// SPIRV:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// SPIRV-NEXT:  [[ENTRY:.*:]]
+// SPIRV-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT:    [[V:%.*]] = alloca i32, align 4
+// SPIRV-NEXT:    store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    store i32 1, ptr [[V]], align 4
+// SPIRV-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP1:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP1]], ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP3:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP3]], ptr [[TMP2]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP5:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP5]], ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT:    [[TMP6:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP7:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP7]], ptr [[TMP6]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT:    [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT:    [[TMP9:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT:    store atomic i32 [[TMP9]], ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT:    ret void
+//
 void fi2a(int *i) {
   int v = 1;
   __scoped_atomic_store(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -69,17 +231,75 @@ void fi2a(int *i) {
 }
 
 // AMDGCN-LABEL: define hidden void @fi2b(
-// AMDGCN:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-NEXT:    [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP1:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP2:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP3:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[DOTATOMICTMP4:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT:    [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP1]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP2]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP3_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP3]] to ptr
+// AMDGCN-NEXT:    [[DOTATOMICTMP4_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP4]] to ptr
+// AMDGCN-NEXT:    store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT:    store i32 1, ptr [[DOTATOMICTMP_ASCAST]], align 4
+/...
[truncated]

@JonChesterfield
Copy link
Collaborator

I would say this change is obviously correct, but I can't see why it was introduced and vaguely fear tripping over abhorrent behaviour in the backend. Can you send this down the internal CI pipeline to pick up some more runtime testing (unless amd-stg-open is already defaulting to the right thing?)

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly marking green, even if this commit upsets something else in the backend having a concurrency primitive default to racy is clearly bad.

@JonChesterfield
Copy link
Collaborator

JonChesterfield commented Dec 16, 2024

@b-sumner has useful context on this. I'll try not to speak for him but it sounds like the block deleted here has the right semantics for opencl, where "seqcst" has some special meaning and generally the semantics don't totally make sense to me. Suggest we amend this to "if opencl, do this thing, otherwise leave it alone"

(can't see how to remove the green tick from the review)

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one-as business seems like it's cruft from before MMRAs. Can we rip them out and replace them with MMRAs for OpenCL?

@arsenm
Copy link
Contributor

arsenm commented Dec 16, 2024

@b-sumner has useful context on this. I'll try not to speak for him but it sounds like the block deleted here has the right semantics for opencl, where "seqcst" has some special meaning and generally the semantics don't totally make sense to me. Suggest we amend this to "if opencl, do this thing, otherwise leave it alone"

I'd rather not infect this with poorly understood language based logic. The one-as should be more aggressive and safe to ignore.

(can't see how to remove the green tick from the review)

"Request changes"

@jhuber6
Copy link
Contributor Author

jhuber6 commented Dec 16, 2024

If that's the case then we should be able to use the LangOpts to change it.

Copy link
Collaborator

@JonChesterfield JonChesterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You need to leave a comment indicating the requested changes."

@JonChesterfield
Copy link
Collaborator

This one-as business seems like it's cruft from before MMRAs. Can we rip them out and replace them with MMRAs for OpenCL?

https://llvm.org/docs/MemoryModelRelaxationAnnotations.html calls out the opencl fence as a motivating example which suggests either yes, or we should amend the MMRA system to be able to express it. Sounds like a much cleaner fix to me, but also open to unblocking openmp with an "if (opencl) {}" clause in the meantime.

@@ -537,7 +537,8 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts,
break;
}

if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
if (LangOpts.OpenCL &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am deeply opposed to making any __builtin_amdgcn intrinsic behavior language dependent. This is a target builtin with target defined behavior that should behave consistently regardless of language mode. I'd rather just take an OpenCL performance regression than this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If OpenCL wants this funny behavior, they should need to use a different builtin. That said, is OpenCL even using the generic builtin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an AMDGPU builitin, the AMDGPU builtin takes a string that corresponds directly to the syncscope string. This is for the 'generic' versions which was reimplemented like three times for OpenCL, HIP, and the GNU / scoped versions. I think that a better solution here would be to only set it if it uses the OpenCL flavored scope arguments, then HIP and C/C++ get the non one-as implementation, as those targets don't explicitly use the same AS knowledge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is 1 builtin : 1 behavior, universally. No modality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amdgcn fence builtin also looks like it already takes optional arguments to append MMRAs. If OpenCL is actually using this builtin anywhere (which I doubt it is), it can easily migrate to __builtin_amdgcn_fence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the current behavior now that I updated it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's using it somewhere since it's the OpenCL spec I think? This currently miscompiles on C/C++ w/ OpenMP so I think it should be fixed. Possible performance regressions but I don't think it's a valid optimization on non-OpenCL targets since we don't explicitly expose the address spaces as a part of the language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have tests running _scoped_atomic* through OpenCL, we should have them

if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
// OpenCL assumes by default that atomic scopes are per-address space for
// non-sequentially consistent operations.
if (Scope >= SyncScope::OpenCLWorkGroup &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain which scopes will generate the one-as versions with this change? I don't know which ones are in that range: agent, workgroup, wavefront, singlethread are the possible candidates.

I also don't understand why the one-as should be generated for a specific range of scopes. How do we know that will not lead to mis-compilation?

Copy link
Contributor Author

@jhuber6 jhuber6 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL ones, this is the range of OpenCL builtins via the opencl_atomic stuff.

Copy link
Contributor

@dhruvachak dhruvachak Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so here's the enum.

enum class SyncScope {
  SystemScope,
  DeviceScope,
  WorkgroupScope,
  WavefrontScope,
  SingleScope,
  HIPSingleThread,
  HIPWavefront,
  HIPWorkgroup,
  HIPAgent,
  HIPSystem,
  OpenCLWorkGroup,
  OpenCLDevice,
  OpenCLAllSVMDevices,
  OpenCLSubGroup,
  Last = OpenCLSubGroup
};

So like you said, the OpenCL ones. If Matt agrees, I am fine with this change though I don't see how it is better than explicitly checking for OpenCL. Just more cryptic. Either way, please add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a comment though?

Copy link
Contributor Author

@jhuber6 jhuber6 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better because it's up to the OpenCL builtin, you can still use the 'standard' builtins in OpenCL which would then otherwise give them different behavior depending on the language.

Copy link
Collaborator

@t-tye t-tye Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does seem questionable is that this function is checking the memory ordering. It feels like that should be done by the caller of this function. It is the caller that should be selecting the right value for the scope argument. For OpenCL the *one-as" sync-scopes should only be used for non-sequentially consistent memory orders. That is a rule of the OpenCL language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the OpenCL builtins outisde of OpenCL https://godbolt.org/z/Ex4aPMx3v. This function is called by all the various reimplementations of this, but with different enum values for the scope. So, if we want the __opencl_ versions to maintain this behavior, we can do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am still feeling I am missing something. This PR is making a change to this function. Nothing else. So it seems the question is if the change is correct. This function should only act on the arguments it is given and faithfully do the right thing. It seems that is what it is doing, except for the check of the memory ordering. That check should be in the callers of the function.

Sounds like the callers of this function are the builtins that you describe. The builtins that are intended to implement OpenCL semantics should be sure to pass the O{PENCL* scope values, and should check the memory ordering for being sequentially consistent. The non-OpenCL builtins should pass in the appropriate non-OPENCL values for scope. Is that what is happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each non-OpenCL function goes through this common interface. The issue is that for non-OpenCL users they all get the one-as feature added, which is not correct in the general case. This patch is an attempt to maintain the OpenCL behavior while the other users of this function will not longer have one-as on them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But with this change that is no longer the case. So is there any push back on this fix?

@b-sumner
Copy link

OpenCL should only be using the amdgcn fence builtin. See, e.g. https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/opencl/src/misc/awif.cl#L73

@JonChesterfield
Copy link
Collaborator

JonChesterfield commented Dec 17, 2024

Stringing the pieces together, we may have been conflating opencl-the-language with opencl-the-implementation.

Let's go with the first line of attack: no language special casing here, no checking seq-cst and appending one-as.

Opencl the implementation won't care because it's using the amdgcn builtin anyway (and I think previously just used IR).

Opencl the language doesn't appear to document what the clang builtin is expected to do on it so it seems to me that it can do whatever we see fit. People writing opencl that happen to be using that intrinsic will see a minor performance degradation that they can work around by changing intrinsic. I anticipate that there are few to no such people, not least because this language-independent clang intrinsic was relatively recently introduced.

Simpler compiler, simpler semantics, unbreaks openmp, doesn't require reworking opencl to use the new metadata. I.e. we ship as initially proposed.

@b-sumner is that acceptable?

(opened #120131 to track this, though it hasn't attracted discussion)

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.
@b-sumner
Copy link

@JonChesterfield I think I agree with you. And OpenCL source should not be calling builtins to begin with since they're not officially part of the OpenCL language.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 3, 2025

Ping

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 6, 2025

I discussed this with @t-tye, he said it's correct but wants @Pierre-vh to sign off on it.

@Pierre-vh
Copy link
Contributor

I'm trying to understand this. Is the function being changed a generic util called by multiple builtins, and this change is just to make one-as exclusive to the OpenCL variant of those builtins ?
Can an identical builtin have different behavior depending on the input language (e.g. OpenCL or C++ mode) ?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 6, 2025

I'm trying to understand this. Is the function being changed a generic util called by multiple builtins, and this change is just to make one-as exclusive to the OpenCL variant of those builtins ? Can an identical builtin have different behavior depending on the input language (e.g. OpenCL or C++ mode) ?

one-as is an OpenCL specific weakening of the atomic memory model. We want to preserve this on OpenCL's builtins but remove it for other atomic codegen. This patch changes the behavior to only add one-as if the user called the __opencl_atomic builtin, regardless of language.

Copy link
Contributor

@Pierre-vh Pierre-vh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. It's surprising it wasn't like that in the first place

@jhuber6 jhuber6 merged commit 81fae0d into llvm:main Jan 6, 2025
8 checks passed
@arsenm
Copy link
Contributor

arsenm commented Jan 6, 2025

Ah, that makes sense. It's surprising it wasn't like that in the first place

But can we now get rid of the -one-as sync scopes and replace them with MMRAs?

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 6, 2025

LLVM Buildbot has detected a new failure on builder clang-debian-cpp20 running on clang-debian-cpp20 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/7783

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
RUN: at line 1: rm -rf /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp && mkdir -p /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp
+ rm -rf /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp
+ mkdir -p /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp
RUN: at line 2: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-mc --triple=loongarch64 --filetype=obj -o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-mc --triple=loongarch64 --filetype=obj -o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s
RUN: at line 3: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec               --abs external_data=0xdeadbeef               --abs external_func=0xcafef00d               --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec --abs external_data=0xdeadbeef --abs external_func=0xcafef00d --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o
llvm-jitlink: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:285: llvm::orc::SymbolStringPool::~SymbolStringPool(): Assertion `Pool.empty() && "Dangling references at pool destruction time"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec --abs external_data=0xdeadbeef --abs external_func=0xcafef00d --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o
 #0 0x00005b9d32f11bb8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xeb5bb8)
 #1 0x00005b9d32f0f6ad llvm::sys::RunSignalHandlers() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xeb36ad)
 #2 0x00005b9d32f12158 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007e333801d510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007e333806b0fc (/lib/x86_64-linux-gnu/libc.so.6+0x8a0fc)
 #5 0x00007e333801d472 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3c472)
 #6 0x00007e33380074b2 abort (/lib/x86_64-linux-gnu/libc.so.6+0x264b2)
 #7 0x00007e33380073d5 (/lib/x86_64-linux-gnu/libc.so.6+0x263d5)
 #8 0x00007e33380163a2 (/lib/x86_64-linux-gnu/libc.so.6+0x353a2)
 #9 0x00005b9d327aaf0c (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x74ef0c)
#10 0x00005b9d32df4b57 llvm::orc::ExecutorProcessControl::~ExecutorProcessControl() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xd98b57)
#11 0x00005b9d32df629f llvm::orc::SelfExecutorProcessControl::~SelfExecutorProcessControl() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xd9a29f)
#12 0x00005b9d32d27e88 llvm::orc::ExecutionSession::~ExecutionSession() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0xccbe88)
#13 0x00005b9d32783edd llvm::Session::~Session() (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x727edd)
#14 0x00005b9d3278e664 main (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x732664)
#15 0x00007e33380086ca (/lib/x86_64-linux-gnu/libc.so.6+0x276ca)
#16 0x00007e3338008785 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27785)
#17 0x00005b9d3277c671 _start (/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink+0x720671)
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.script: line 3: 3099096 Aborted                 /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llvm-jitlink --noexec --abs external_data=0xdeadbeef --abs external_func=0xcafef00d --check /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch64_relocations.s /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/test/ExecutionEngine/JITLink/LoongArch/Output/ELF_loongarch64_relocations.s.tmp/elf_reloc.o

--

********************


@Pierre-vh
Copy link
Contributor

Ah, that makes sense. It's surprising it wasn't like that in the first place

But can we now get rid of the -one-as sync scopes and replace them with MMRAs?

MMRAs are droppable so not currently suited to replace syncscopes like that, but it's a goal.

As part of the Vulkan memory model work I need to reevaluate that aspect of MMRAs and fix it (= promote them to an instruction operand), or use another system altogether (in which case MMRAs may just get removed). It's not yet decided which one of those 2 options it'll be yet, but I plan to regularly follow-up on this from now on until it's addressed so hopefully something will happen medium term.

I'm not sure how much resistance there'd be upstream to making MMRAs an instruction operand instead of MD?

paulhuggett pushed a commit to paulhuggett/llvm-project that referenced this pull request Jan 7, 2025
…vm#120095)

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 18, 2025
…vm#120095)

Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants