-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang][AMDGPU] Stop defaulting to one-as
for all atomic scopes
#120095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: Patch is 266.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120095.diff 8 Files Affected:
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 56ad0503a11ab2..ecf044436d8c86 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -537,13 +537,6 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts,
break;
}
- if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) {
- if (!Name.empty())
- Name = Twine(Twine(Name) + Twine("-")).str();
-
- Name = Twine(Twine(Name) + Twine("one-as")).str();
- }
-
return Ctx.getOrInsertSyncScopeID(Name);
}
diff --git a/clang/test/CodeGen/scoped-atomic-ops.c b/clang/test/CodeGen/scoped-atomic-ops.c
index cf98812a07e91d..c3162ebe66661f 100644
--- a/clang/test/CodeGen/scoped-atomic-ops.c
+++ b/clang/test/CodeGen/scoped-atomic-ops.c
@@ -5,17 +5,57 @@
// RUN: -fvisibility=hidden | FileCheck --check-prefix=SPIRV %s
// AMDGCN-LABEL: define hidden i32 @fi1a(
-// AMDGCN: [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN: [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN: [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN: [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN: [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread-one-as") monotonic, align 4
-// SPIRV: define hidden spir_func i32 @fi1a(
-// SPIRV: [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] monotonic, align 4
-// SPIRV: [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("device") monotonic, align 4
-// SPIRV: [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV: [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV: [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+// AMDGCN-NEXT: [[ENTRY:.*:]]
+// AMDGCN-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT: [[V:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGCN-NEXT: [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT: [[V_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[V]] to ptr
+// AMDGCN-NEXT: store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP1]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP2:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP3:%.*]] = load atomic i32, ptr [[TMP2]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP3]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP5]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP6:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP7:%.*]] = load atomic i32, ptr [[TMP6]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP7]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP9]], ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP10:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: ret i32 [[TMP10]]
+//
+// SPIRV-LABEL: define hidden spir_func i32 @fi1a(
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0:[0-9]+]] {
+// SPIRV-NEXT: [[ENTRY:.*:]]
+// SPIRV-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT: [[V:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP1]], ptr [[V]], align 4
+// SPIRV-NEXT: [[TMP2:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP3:%.*]] = load atomic i32, ptr [[TMP2]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP3]], ptr [[V]], align 4
+// SPIRV-NEXT: [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP5]], ptr [[V]], align 4
+// SPIRV-NEXT: [[TMP6:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP7:%.*]] = load atomic i32, ptr [[TMP6]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP7]], ptr [[V]], align 4
+// SPIRV-NEXT: [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP9]], ptr [[V]], align 4
+// SPIRV-NEXT: [[TMP10:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT: ret i32 [[TMP10]]
+//
int fi1a(int *i) {
int v;
__scoped_atomic_load(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -27,17 +67,101 @@ int fi1a(int *i) {
}
// AMDGCN-LABEL: define hidden i32 @fi1b(
-// AMDGCN: [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN: [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN: [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN: [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN: [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT: [[ENTRY:.*:]]
+// AMDGCN-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[ATOMIC_TEMP1:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[ATOMIC_TEMP2:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[ATOMIC_TEMP3:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[ATOMIC_TEMP4:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
+// AMDGCN-NEXT: [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT: [[ATOMIC_TEMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP]] to ptr
+// AMDGCN-NEXT: [[ATOMIC_TEMP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP1]] to ptr
+// AMDGCN-NEXT: [[ATOMIC_TEMP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP2]] to ptr
+// AMDGCN-NEXT: [[ATOMIC_TEMP3_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP3]] to ptr
+// AMDGCN-NEXT: [[ATOMIC_TEMP4_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ATOMIC_TEMP4]] to ptr
+// AMDGCN-NEXT: store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP1]], ptr [[ATOMIC_TEMP_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP3:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 [[TMP2]], ptr [[TMP3]], align 4
+// AMDGCN-NEXT: [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP5]], ptr [[ATOMIC_TEMP1_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP6:%.*]] = load i32, ptr [[ATOMIC_TEMP1_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP7:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 [[TMP6]], ptr [[TMP7]], align 4
+// AMDGCN-NEXT: [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP9]], ptr [[ATOMIC_TEMP2_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP10:%.*]] = load i32, ptr [[ATOMIC_TEMP2_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP11:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 [[TMP10]], ptr [[TMP11]], align 4
+// AMDGCN-NEXT: [[TMP12:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP13:%.*]] = load atomic i32, ptr [[TMP12]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP13]], ptr [[ATOMIC_TEMP3_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP14:%.*]] = load i32, ptr [[ATOMIC_TEMP3_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP15:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 [[TMP14]], ptr [[TMP15]], align 4
+// AMDGCN-NEXT: [[TMP16:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP17:%.*]] = load atomic i32, ptr [[TMP16]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT: store i32 [[TMP17]], ptr [[ATOMIC_TEMP4_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP18:%.*]] = load i32, ptr [[ATOMIC_TEMP4_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP19:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 [[TMP18]], ptr [[TMP19]], align 4
+// AMDGCN-NEXT: [[TMP20:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
+// AMDGCN-NEXT: ret i32 [[TMP21]]
+//
// SPIRV-LABEL: define hidden spir_func i32 @fi1b(
-// SPIRV: [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] monotonic, align 4
-// SPIRV: [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
-// SPIRV: [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV: [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV: [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// SPIRV-NEXT: [[ENTRY:.*:]]
+// SPIRV-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: [[ATOMIC_TEMP1:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: [[ATOMIC_TEMP2:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: [[ATOMIC_TEMP3:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: [[ATOMIC_TEMP4:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP1:%.*]] = load atomic i32, ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP1]], ptr [[ATOMIC_TEMP]], align 4
+// SPIRV-NEXT: [[TMP2:%.*]] = load i32, ptr [[ATOMIC_TEMP]], align 4
+// SPIRV-NEXT: [[TMP3:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: store i32 [[TMP2]], ptr [[TMP3]], align 4
+// SPIRV-NEXT: [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP5:%.*]] = load atomic i32, ptr [[TMP4]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 4
+// SPIRV-NEXT: [[TMP6:%.*]] = load i32, ptr [[ATOMIC_TEMP1]], align 4
+// SPIRV-NEXT: [[TMP7:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: store i32 [[TMP6]], ptr [[TMP7]], align 4
+// SPIRV-NEXT: [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP9:%.*]] = load atomic i32, ptr [[TMP8]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP9]], ptr [[ATOMIC_TEMP2]], align 4
+// SPIRV-NEXT: [[TMP10:%.*]] = load i32, ptr [[ATOMIC_TEMP2]], align 4
+// SPIRV-NEXT: [[TMP11:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: store i32 [[TMP10]], ptr [[TMP11]], align 4
+// SPIRV-NEXT: [[TMP12:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP13:%.*]] = load atomic i32, ptr [[TMP12]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP13]], ptr [[ATOMIC_TEMP3]], align 4
+// SPIRV-NEXT: [[TMP14:%.*]] = load i32, ptr [[ATOMIC_TEMP3]], align 4
+// SPIRV-NEXT: [[TMP15:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: store i32 [[TMP14]], ptr [[TMP15]], align 4
+// SPIRV-NEXT: [[TMP16:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP17:%.*]] = load atomic i32, ptr [[TMP16]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT: store i32 [[TMP17]], ptr [[ATOMIC_TEMP4]], align 4
+// SPIRV-NEXT: [[TMP18:%.*]] = load i32, ptr [[ATOMIC_TEMP4]], align 4
+// SPIRV-NEXT: [[TMP19:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: store i32 [[TMP18]], ptr [[TMP19]], align 4
+// SPIRV-NEXT: [[TMP20:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
+// SPIRV-NEXT: ret i32 [[TMP21]]
+//
int fi1b(int *i) {
*i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
*i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
@@ -48,17 +172,55 @@ int fi1b(int *i) {
}
// AMDGCN-LABEL: define hidden void @fi2a(
-// AMDGCN: store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT: [[ENTRY:.*:]]
+// AMDGCN-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT: [[V:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT: [[V_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[V]] to ptr
+// AMDGCN-NEXT: store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 1, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP1:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: store atomic i32 [[TMP1]], ptr [[TMP0]] monotonic, align 4
+// AMDGCN-NEXT: [[TMP2:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP3:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: store atomic i32 [[TMP3]], ptr [[TMP2]] syncscope("agent") monotonic, align 4
+// AMDGCN-NEXT: [[TMP4:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP5:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: store atomic i32 [[TMP5]], ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// AMDGCN-NEXT: [[TMP6:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP7:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: store atomic i32 [[TMP7]], ptr [[TMP6]] syncscope("wavefront") monotonic, align 4
+// AMDGCN-NEXT: [[TMP8:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP9:%.*]] = load i32, ptr [[V_ASCAST]], align 4
+// AMDGCN-NEXT: store atomic i32 [[TMP9]], ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// AMDGCN-NEXT: ret void
+//
// SPIRV-LABEL: define hidden spir_func void @fi2a(
-// SPIRV: store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] monotonic, align 4
-// SPIRV: store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
-// SPIRV: store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
-// SPIRV: store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
-// SPIRV: store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
+// SPIRV-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// SPIRV-NEXT: [[ENTRY:.*:]]
+// SPIRV-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8
+// SPIRV-NEXT: [[V:%.*]] = alloca i32, align 4
+// SPIRV-NEXT: store ptr [[I]], ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: store i32 1, ptr [[V]], align 4
+// SPIRV-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP1:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT: store atomic i32 [[TMP1]], ptr [[TMP0]] monotonic, align 4
+// SPIRV-NEXT: [[TMP2:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP3:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT: store atomic i32 [[TMP3]], ptr [[TMP2]] syncscope("device") monotonic, align 4
+// SPIRV-NEXT: [[TMP4:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP5:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT: store atomic i32 [[TMP5]], ptr [[TMP4]] syncscope("workgroup") monotonic, align 4
+// SPIRV-NEXT: [[TMP6:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP7:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT: store atomic i32 [[TMP7]], ptr [[TMP6]] syncscope("subgroup") monotonic, align 4
+// SPIRV-NEXT: [[TMP8:%.*]] = load ptr, ptr [[I_ADDR]], align 8
+// SPIRV-NEXT: [[TMP9:%.*]] = load i32, ptr [[V]], align 4
+// SPIRV-NEXT: store atomic i32 [[TMP9]], ptr [[TMP8]] syncscope("singlethread") monotonic, align 4
+// SPIRV-NEXT: ret void
+//
void fi2a(int *i) {
int v = 1;
__scoped_atomic_store(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -69,17 +231,75 @@ void fi2a(int *i) {
}
// AMDGCN-LABEL: define hidden void @fi2b(
-// AMDGCN: store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// AMDGCN: store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-SAME: ptr noundef [[I:%.*]]) #[[ATTR0]] {
+// AMDGCN-NEXT: [[ENTRY:.*:]]
+// AMDGCN-NEXT: [[I_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
+// AMDGCN-NEXT: [[DOTATOMICTMP:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[DOTATOMICTMP1:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[DOTATOMICTMP2:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[DOTATOMICTMP3:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[DOTATOMICTMP4:%.*]] = alloca i32, align 4, addrspace(5)
+// AMDGCN-NEXT: [[I_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[I_ADDR]] to ptr
+// AMDGCN-NEXT: [[DOTATOMICTMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP]] to ptr
+// AMDGCN-NEXT: [[DOTATOMICTMP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP1]] to ptr
+// AMDGCN-NEXT: [[DOTATOMICTMP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP2]] to ptr
+// AMDGCN-NEXT: [[DOTATOMICTMP3_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP3]] to ptr
+// AMDGCN-NEXT: [[DOTATOMICTMP4_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[DOTATOMICTMP4]] to ptr
+// AMDGCN-NEXT: store ptr [[I]], ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: [[TMP0:%.*]] = load ptr, ptr [[I_ADDR_ASCAST]], align 8
+// AMDGCN-NEXT: store i32 1, ptr [[DOTATOMICTMP_ASCAST]], align 4
+/...
[truncated]
|
I would say this change is obviously correct, but I can't see why it was introduced and vaguely fear tripping over abhorrent behaviour in the backend. Can you send this down the internal CI pipeline to pick up some more runtime testing (unless amd-stg-open is already defaulting to the right thing?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly marking green, even if this commit upsets something else in the backend having a concurrency primitive default to racy is clearly bad.
@b-sumner has useful context on this. I'll try not to speak for him but it sounds like the block deleted here has the right semantics for opencl, where "seqcst" has some special meaning and generally the semantics don't totally make sense to me. Suggest we amend this to "if opencl, do this thing, otherwise leave it alone" (can't see how to remove the green tick from the review) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one-as business seems like it's cruft from before MMRAs. Can we rip them out and replace them with MMRAs for OpenCL?
I'd rather not infect this with poorly understood language based logic. The one-as should be more aggressive and safe to ignore.
"Request changes" |
If that's the case then we should be able to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You need to leave a comment indicating the requested changes."
https://llvm.org/docs/MemoryModelRelaxationAnnotations.html calls out the opencl fence as a motivating example which suggests either yes, or we should amend the MMRA system to be able to express it. Sounds like a much cleaner fix to me, but also open to unblocking openmp with an "if (opencl) {}" clause in the meantime. |
clang/lib/CodeGen/Targets/AMDGPU.cpp
Outdated
@@ -537,7 +537,8 @@ AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &LangOpts, | |||
break; | |||
} | |||
|
|||
if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) { | |||
if (LangOpts.OpenCL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am deeply opposed to making any __builtin_amdgcn intrinsic behavior language dependent. This is a target builtin with target defined behavior that should behave consistently regardless of language mode. I'd rather just take an OpenCL performance regression than this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If OpenCL wants this funny behavior, they should need to use a different builtin. That said, is OpenCL even using the generic builtin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an AMDGPU builitin, the AMDGPU builtin takes a string that corresponds directly to the syncscope string. This is for the 'generic' versions which was reimplemented like three times for OpenCL, HIP, and the GNU / scoped versions. I think that a better solution here would be to only set it if it uses the OpenCL flavored scope arguments, then HIP and C/C++ get the non one-as
implementation, as those targets don't explicitly use the same AS knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is 1 builtin : 1 behavior, universally. No modality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amdgcn fence builtin also looks like it already takes optional arguments to append MMRAs. If OpenCL is actually using this builtin anywhere (which I doubt it is), it can easily migrate to __builtin_amdgcn_fence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the current behavior now that I updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure it's using it somewhere since it's the OpenCL spec I think? This currently miscompiles on C/C++ w/ OpenMP so I think it should be fixed. Possible performance regressions but I don't think it's a valid optimization on non-OpenCL targets since we don't explicitly expose the address spaces as a part of the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have tests running _scoped_atomic* through OpenCL, we should have them
if (Ordering != llvm::AtomicOrdering::SequentiallyConsistent) { | ||
// OpenCL assumes by default that atomic scopes are per-address space for | ||
// non-sequentially consistent operations. | ||
if (Scope >= SyncScope::OpenCLWorkGroup && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain which scopes will generate the one-as versions with this change? I don't know which ones are in that range: agent, workgroup, wavefront, singlethread are the possible candidates.
I also don't understand why the one-as should be generated for a specific range of scopes. How do we know that will not lead to mis-compilation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL ones, this is the range of OpenCL builtins via the opencl_atomic
stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so here's the enum.
enum class SyncScope {
SystemScope,
DeviceScope,
WorkgroupScope,
WavefrontScope,
SingleScope,
HIPSingleThread,
HIPWavefront,
HIPWorkgroup,
HIPAgent,
HIPSystem,
OpenCLWorkGroup,
OpenCLDevice,
OpenCLAllSVMDevices,
OpenCLSubGroup,
Last = OpenCLSubGroup
};
So like you said, the OpenCL ones. If Matt agrees, I am fine with this change though I don't see how it is better than explicitly checking for OpenCL. Just more cryptic. Either way, please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a comment though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better because it's up to the OpenCL builtin, you can still use the 'standard' builtins in OpenCL which would then otherwise give them different behavior depending on the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does seem questionable is that this function is checking the memory ordering. It feels like that should be done by the caller of this function. It is the caller that should be selecting the right value for the scope argument. For OpenCL the *one-as" sync-scopes should only be used for non-sequentially consistent memory orders. That is a rule of the OpenCL language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the OpenCL builtins outisde of OpenCL https://godbolt.org/z/Ex4aPMx3v. This function is called by all the various reimplementations of this, but with different enum values for the scope. So, if we want the __opencl_
versions to maintain this behavior, we can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am still feeling I am missing something. This PR is making a change to this function. Nothing else. So it seems the question is if the change is correct. This function should only act on the arguments it is given and faithfully do the right thing. It seems that is what it is doing, except for the check of the memory ordering. That check should be in the callers of the function.
Sounds like the callers of this function are the builtins that you describe. The builtins that are intended to implement OpenCL semantics should be sure to pass the O{PENCL* scope values, and should check the memory ordering for being sequentially consistent. The non-OpenCL builtins should pass in the appropriate non-OPENCL values for scope. Is that what is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each non-OpenCL function goes through this common interface. The issue is that for non-OpenCL users they all get the one-as
feature added, which is not correct in the general case. This patch is an attempt to maintain the OpenCL behavior while the other users of this function will not longer have one-as
on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with this change that is no longer the case. So is there any push back on this fix?
OpenCL should only be using the amdgcn fence builtin. See, e.g. https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/opencl/src/misc/awif.cl#L73 |
Stringing the pieces together, we may have been conflating opencl-the-language with opencl-the-implementation. Let's go with the first line of attack: no language special casing here, no checking seq-cst and appending one-as. Opencl the implementation won't care because it's using the amdgcn builtin anyway (and I think previously just used IR). Opencl the language doesn't appear to document what the clang builtin is expected to do on it so it seems to me that it can do whatever we see fit. People writing opencl that happen to be using that intrinsic will see a minor performance degradation that they can work around by changing intrinsic. I anticipate that there are few to no such people, not least because this language-independent clang intrinsic was relatively recently introduced. Simpler compiler, simpler semantics, unbreaks openmp, doesn't require reworking opencl to use the new metadata. I.e. we ship as initially proposed. @b-sumner is that acceptable? (opened #120131 to track this, though it hasn't attracted discussion) |
Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.
@JonChesterfield I think I agree with you. And OpenCL source should not be calling builtins to begin with since they're not officially part of the OpenCL language. |
Ping |
I discussed this with @t-tye, he said it's correct but wants @Pierre-vh to sign off on it. |
I'm trying to understand this. Is the function being changed a generic util called by multiple builtins, and this change is just to make |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. It's surprising it wasn't like that in the first place
But can we now get rid of the -one-as sync scopes and replace them with MMRAs? |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/7783 Here is the relevant piece of the build log for the reference
|
MMRAs are droppable so not currently suited to replace syncscopes like that, but it's a goal. As part of the Vulkan memory model work I need to reevaluate that aspect of MMRAs and fix it (= promote them to an instruction operand), or use another system altogether (in which case MMRAs may just get removed). It's not yet decided which one of those 2 options it'll be yet, but I plan to regularly follow-up on this from now on until it's addressed so hopefully something will happen medium term. I'm not sure how much resistance there'd be upstream to making MMRAs an instruction operand instead of MD? |
…vm#120095) Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.
…vm#120095) Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.
Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.