Skip to content

[FunctionAttrs] Improve handling of alias-preserving intrinsic calls #68453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 24, 2023

Conversation

krzysz00
Copy link
Contributor

@krzysz00 krzysz00 commented Oct 6, 2023

Fixes #68270

The function attribute analysis handles many instructions, like addrspacecast, which do not themselves read or write memory but which transform pointers into other values in the same alias set.

There are intrinsic functions, such as ptrmask or the AMDGPU-specific make.buffer.rsrc, which also preserve membership in alias sets without capturing. This commit adds the addrspacecast-like behavior to these calls.

@llvmbot
Copy link
Member

llvmbot commented Oct 6, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Changes

Fixes #68270

The function attribute analysis handles many instructions, like addrspacecast, which do not themselves read or write memory but which transform pointers into other values in the same alias set and don't capture the pointer.

There are intrinsic functions, such as ptrmask or the AMDGPU-specific make.buffer.rsrc, which also preserve membership in alias sets without capturing. These were not being handled correctly, leading to pointers being incorrectly marked readnone.


Full diff: https://github.com/llvm/llvm-project/pull/68453.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/FunctionAttrs.cpp (+10)
  • (added) llvm/test/Transforms/FunctionAttrs/AMDGPU/lit.local.cfg (+2)
  • (added) llvm/test/Transforms/FunctionAttrs/AMDGPU/make-buffer-rsrc.ll (+47)
  • (modified) llvm/test/Transforms/FunctionAttrs/writeonly.ll (+31-10)
diff --git a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
index 95b3204d02beb81..ff344360cdf7634 100644
--- a/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionAttrs.cpp
@@ -654,6 +654,16 @@ determinePointerAccessAttrs(Argument *A,
       // must be a data operand (e.g. argument or operand bundle)
       const unsigned UseIndex = CB.getDataOperandNo(U);
 
+      // Some intrinsics (for instance ptrmask) do not capture their results,
+      // but return results thas alias their pointer argument, and thus should
+      // be handled like GEP or addrspacecast above.
+      if (isIntrinsicReturningPointerAliasingArgumentWithoutCapturing(
+              &CB, /*MustPreserveNullness=*/false)) {
+        for (Use &UU : CB.uses())
+          if (Visited.insert(&UU).second)
+            Worklist.push_back(&UU);
+      }
+
       if (!CB.doesNotCapture(UseIndex)) {
         if (!CB.onlyReadsMemory())
           // If the callee can save a copy into other memory, then simply
diff --git a/llvm/test/Transforms/FunctionAttrs/AMDGPU/lit.local.cfg b/llvm/test/Transforms/FunctionAttrs/AMDGPU/lit.local.cfg
new file mode 100644
index 000000000000000..7c492428aec7614
--- /dev/null
+++ b/llvm/test/Transforms/FunctionAttrs/AMDGPU/lit.local.cfg
@@ -0,0 +1,2 @@
+if not "AMDGPU" in config.root.targets:
+    config.unsupported = True
diff --git a/llvm/test/Transforms/FunctionAttrs/AMDGPU/make-buffer-rsrc.ll b/llvm/test/Transforms/FunctionAttrs/AMDGPU/make-buffer-rsrc.ll
new file mode 100644
index 000000000000000..78219970395627c
--- /dev/null
+++ b/llvm/test/Transforms/FunctionAttrs/AMDGPU/make-buffer-rsrc.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --check-attributes
+; RUN: opt -passes=function-attrs -S < %s | FileCheck --check-prefixes=COMMON,FNATTRS %s
+; RUN: opt -passes=attributor-light -S < %s | FileCheck --check-prefixes=COMMON,ATTRIBUTOR %s
+
+target triple = "amdgcn-amd-amdhsa"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7:8"
+
+define amdgpu_kernel void @test_make_buffer_rsrc(ptr %p, ptr %q) {
+; FNATTRS: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(argmem: readwrite)
+; FNATTRS-LABEL: define {{[^@]+}}@test_make_buffer_rsrc
+; FNATTRS-SAME: (ptr nocapture readonly [[P:%.*]], ptr nocapture writeonly [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
+; FNATTRS-NEXT:    [[P_RSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr [[P]], i16 0, i32 4, i32 822243328)
+; FNATTRS-NEXT:    [[Q_RSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr [[Q]], i16 0, i32 4, i32 822243328)
+; FNATTRS-NEXT:    [[V:%.*]] = call i8 @llvm.amdgcn.raw.ptr.buffer.load.i8(ptr addrspace(8) [[P_RSRC]], i32 0, i32 0, i32 0)
+; FNATTRS-NEXT:    call void @llvm.amdgcn.raw.ptr.buffer.store.i8(i8 [[V]], ptr addrspace(8) [[Q_RSRC]], i32 0, i32 0, i32 0)
+; FNATTRS-NEXT:    ret void
+;
+; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
+; ATTRIBUTOR-LABEL: define {{[^@]+}}@test_make_buffer_rsrc
+; ATTRIBUTOR-SAME: (ptr nocapture nofree readnone [[P:%.*]], ptr nocapture nofree readnone [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
+; ATTRIBUTOR-NEXT:    [[P_RSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr nocapture [[P]], i16 0, i32 4, i32 822243328) #[[ATTR4:[0-9]+]]
+; ATTRIBUTOR-NEXT:    [[Q_RSRC:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr nocapture [[Q]], i16 0, i32 4, i32 822243328) #[[ATTR4]]
+; ATTRIBUTOR-NEXT:    [[V:%.*]] = call i8 @llvm.amdgcn.raw.ptr.buffer.load.i8(ptr addrspace(8) readonly [[P_RSRC]], i32 0, i32 0, i32 0) #[[ATTR5:[0-9]+]]
+; ATTRIBUTOR-NEXT:    call void @llvm.amdgcn.raw.ptr.buffer.store.i8(i8 [[V]], ptr addrspace(8) writeonly [[Q_RSRC]], i32 0, i32 0, i32 0) #[[ATTR6:[0-9]+]]
+; ATTRIBUTOR-NEXT:    ret void
+;
+  %p.rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 0, i32 4, i32 822243328)
+  %q.rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %q, i16 0, i32 4, i32 822243328)
+  %v = call i8 @llvm.amdgcn.raw.ptr.buffer.load.i8(ptr addrspace(8) %p.rsrc, i32 0, i32 0, i32 0)
+  call void @llvm.amdgcn.raw.ptr.buffer.store.i8(i8 %v, ptr addrspace(8) %q.rsrc, i32 0, i32 0, i32 0)
+  ret void
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr nocapture readnone, i16, i32, i32) #0
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: read)
+declare i8 @llvm.amdgcn.raw.ptr.buffer.load.i8(ptr addrspace(8) nocapture readonly, i32, i32, i32 immarg) #1
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: write)
+declare void @llvm.amdgcn.raw.ptr.buffer.store.i8(i8, ptr addrspace(8) nocapture writeonly, i32, i32, i32 immarg) #2
+
+attributes #0 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: read) }
+attributes #2 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: write) }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; COMMON: {{.*}}
diff --git a/llvm/test/Transforms/FunctionAttrs/writeonly.ll b/llvm/test/Transforms/FunctionAttrs/writeonly.ll
index 633068129eac697..9d919b4892031aa 100644
--- a/llvm/test/Transforms/FunctionAttrs/writeonly.ll
+++ b/llvm/test/Transforms/FunctionAttrs/writeonly.ll
@@ -178,6 +178,27 @@ define void @test_atomicrmw(ptr %p) {
   ret void
 }
 
+define void @test_ptrmask(ptr %p) {
+; FNATTRS: Function Attrs: mustprogress nofree nosync nounwind willreturn memory(argmem: write)
+; FNATTRS-LABEL: define {{[^@]+}}@test_ptrmask
+; FNATTRS-SAME: (ptr writeonly [[P:%.*]]) #[[ATTR8:[0-9]+]] {
+; FNATTRS-NEXT:    [[MASK:%.*]] = call ptr @llvm.ptrmask.p0.i64(ptr [[P]], i64 -5)
+; FNATTRS-NEXT:    store i8 0, ptr [[MASK]], align 1
+; FNATTRS-NEXT:    ret void
+;
+; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; ATTRIBUTOR-LABEL: define {{[^@]+}}@test_ptrmask
+; ATTRIBUTOR-SAME: (ptr nofree writeonly [[P:%.*]]) #[[ATTR3]] {
+; ATTRIBUTOR-NEXT:    [[MASK:%.*]] = call ptr @llvm.ptrmask.p0.i64(ptr [[P]], i64 -5) #[[ATTR9:[0-9]+]]
+; ATTRIBUTOR-NEXT:    store i8 0, ptr [[MASK]], align 1
+; ATTRIBUTOR-NEXT:    ret void
+;
+  %mask = call ptr @llvm.ptrmask.p0.i64(ptr %p, i64 -5)
+  store i8 0, ptr %mask
+  ret void
+}
+
+declare ptr @llvm.ptrmask.p0.i64(ptr, i64)
 
 declare void @direct1_callee(ptr %p)
 
@@ -197,14 +218,14 @@ declare void @direct2_callee(ptr %p) writeonly
 define void @direct2(ptr %p) {
 ; FNATTRS: Function Attrs: memory(write)
 ; FNATTRS-LABEL: define {{[^@]+}}@direct2
-; FNATTRS-SAME: (ptr [[P:%.*]]) #[[ATTR8:[0-9]+]] {
+; FNATTRS-SAME: (ptr [[P:%.*]]) #[[ATTR10:[0-9]+]] {
 ; FNATTRS-NEXT:    call void @direct2_callee(ptr [[P]])
 ; FNATTRS-NEXT:    ret void
 ;
 ; ATTRIBUTOR: Function Attrs: memory(write)
 ; ATTRIBUTOR-LABEL: define {{[^@]+}}@direct2
-; ATTRIBUTOR-SAME: (ptr writeonly [[P:%.*]]) #[[ATTR7:[0-9]+]] {
-; ATTRIBUTOR-NEXT:    call void @direct2_callee(ptr [[P]]) #[[ATTR7]]
+; ATTRIBUTOR-SAME: (ptr writeonly [[P:%.*]]) #[[ATTR8:[0-9]+]] {
+; ATTRIBUTOR-NEXT:    call void @direct2_callee(ptr [[P]]) #[[ATTR8]]
 ; ATTRIBUTOR-NEXT:    ret void
 ;
   call void @direct2_callee(ptr %p)
@@ -215,14 +236,14 @@ define void @direct2(ptr %p) {
 define void @direct2b(ptr %p) {
 ; FNATTRS: Function Attrs: memory(write)
 ; FNATTRS-LABEL: define {{[^@]+}}@direct2b
-; FNATTRS-SAME: (ptr nocapture [[P:%.*]]) #[[ATTR8]] {
+; FNATTRS-SAME: (ptr nocapture [[P:%.*]]) #[[ATTR10]] {
 ; FNATTRS-NEXT:    call void @direct2_callee(ptr nocapture [[P]])
 ; FNATTRS-NEXT:    ret void
 ;
 ; ATTRIBUTOR: Function Attrs: memory(write)
 ; ATTRIBUTOR-LABEL: define {{[^@]+}}@direct2b
-; ATTRIBUTOR-SAME: (ptr nocapture writeonly [[P:%.*]]) #[[ATTR7]] {
-; ATTRIBUTOR-NEXT:    call void @direct2_callee(ptr nocapture writeonly [[P]]) #[[ATTR7]]
+; ATTRIBUTOR-SAME: (ptr nocapture writeonly [[P:%.*]]) #[[ATTR8]] {
+; ATTRIBUTOR-NEXT:    call void @direct2_callee(ptr nocapture writeonly [[P]]) #[[ATTR8]]
 ; ATTRIBUTOR-NEXT:    ret void
 ;
   call void @direct2_callee(ptr nocapture %p)
@@ -304,14 +325,14 @@ define void @fptr_test2(ptr %p, ptr %f) {
 define void @fptr_test3(ptr %p, ptr %f) {
 ; FNATTRS: Function Attrs: memory(write)
 ; FNATTRS-LABEL: define {{[^@]+}}@fptr_test3
-; FNATTRS-SAME: (ptr nocapture [[P:%.*]], ptr nocapture readonly [[F:%.*]]) #[[ATTR8]] {
-; FNATTRS-NEXT:    call void [[F]](ptr nocapture [[P]]) #[[ATTR8]]
+; FNATTRS-SAME: (ptr nocapture [[P:%.*]], ptr nocapture readonly [[F:%.*]]) #[[ATTR10]] {
+; FNATTRS-NEXT:    call void [[F]](ptr nocapture [[P]]) #[[ATTR10]]
 ; FNATTRS-NEXT:    ret void
 ;
 ; ATTRIBUTOR: Function Attrs: memory(write)
 ; ATTRIBUTOR-LABEL: define {{[^@]+}}@fptr_test3
-; ATTRIBUTOR-SAME: (ptr nocapture writeonly [[P:%.*]], ptr nocapture nofree nonnull writeonly [[F:%.*]]) #[[ATTR7]] {
-; ATTRIBUTOR-NEXT:    call void [[F]](ptr nocapture [[P]]) #[[ATTR7]]
+; ATTRIBUTOR-SAME: (ptr nocapture writeonly [[P:%.*]], ptr nocapture nofree nonnull writeonly [[F:%.*]]) #[[ATTR8]] {
+; ATTRIBUTOR-NEXT:    call void [[F]](ptr nocapture [[P]]) #[[ATTR8]]
 ; ATTRIBUTOR-NEXT:    ret void
 ;
   call void %f(ptr nocapture %p) writeonly

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a reasonable improvement to precision, but this is not a bug fix.

An argument that gets returned by a function must not be marked as nocapture. The return value counts as a capture. The current attributes on the llvm.amdgcn.make.buffer.rsrc intrinsic are incorrect.

@nikic
Copy link
Contributor

nikic commented Oct 6, 2023

This looks like a reasonable improvement to precision, but this is not a bug fix.

Reading more carefully, you still fall through to the usual behavior, so it's not an improvement to precision either. So this appears to only be a workaround for an incorrect intrinsic annotation.

@krzysz00
Copy link
Contributor Author

krzysz00 commented Oct 6, 2023

Huh

In which case, I'm confused by why the function in ValueTracking mentions WithoutCapturing.

And by what the semantics of nocapture are in general - my sense of it was that a capture involves storing a pointer somewhere such that it might be used after its SSA value is out of scope, and so nocapture indicates a lack of that behavior.

It's entirely possible that I'll want to go fix up make.buffer.rsrc and then also commit this as a precision improvement.

@nikic
Copy link
Contributor

nikic commented Oct 7, 2023

In which case, I'm confused by why the function in ValueTracking mentions WithoutCapturing.

The name is a bit confusing, it would be more accurate to say without other capturing. The point of the function is to handle functions that only capture via the return value, as that's not something that can currently be expressed with attributes.

And by what the semantics of nocapture are in general - my sense of it was that a capture involves storing a pointer somewhere such that it might be used after its SSA value is out of scope, and so nocapture indicates a lack of that behavior.

See https://llvm.org/docs/LangRef.html#pointercapture, especially the comments in the first example. Your description is basically right (at least for the provenance escape part of "capture"), but applies beyond stores to memory. What matters is that the pointer outlives the call in some way.

@krzysz00
Copy link
Contributor Author

krzysz00 commented Oct 9, 2023

Ah, in which case, I'll file a separate commit to fix up the annotations on the intrinsic and leave this as a precision enhancement

krzysz00 added a commit to krzysz00/llvm-project that referenced this pull request Oct 9, 2023
Per discussion on llvm#68453 ,
the make_buffer_rsrc intrinsic was incorrectly marked noclobber. This
commit fixes the issue.
krzysz00 added a commit that referenced this pull request Oct 10, 2023
Per discussion on #68453 , the
make_buffer_rsrc intrinsic was incorrectly marked noclobber. This commit
fixes the issue.
@krzysz00 krzysz00 force-pushed the fix-function-attrs-aliasing-intrinsics branch from 19dcb06 to 321f758 Compare October 10, 2023 14:50
@krzysz00 krzysz00 changed the title [FunctionAttrs] Handle alias-preserving intrinsic calls correctly [FunctionAttrs] Improve handling of alias-preserving intrinsic calls Oct 10, 2023
@krzysz00 krzysz00 requested a review from nikic October 10, 2023 14:53
@krzysz00
Copy link
Contributor Author

@nikic I've updated the commit messaged and removed the fallthrough, thus making this a precision-improving PR

@krzysz00 krzysz00 force-pushed the fix-function-attrs-aliasing-intrinsics branch from 321f758 to cc5aa21 Compare October 10, 2023 15:07
@krzysz00 krzysz00 force-pushed the fix-function-attrs-aliasing-intrinsics branch from cc5aa21 to 04ca3b9 Compare October 16, 2023 15:50
@krzysz00
Copy link
Contributor Author

Also, any insight on how to make the equivalent change to the attributor?

; RUN: opt -passes=function-attrs -S < %s | FileCheck --check-prefixes=COMMON,FNATTRS %s
; RUN: opt -passes=attributor-light -S < %s | FileCheck --check-prefixes=COMMON,ATTRIBUTOR %s

target triple = "amdgcn-amd-amdhsa"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this (and the target-specific directory) needed? Intrinsics generally don't depend on availability of the target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, then yeah, we can probably make this not target-specific, but I vaguely remember running into issues with AMDGPU-specific tests before that made me cautious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as dropping the triple doesn't make the test fail, you should be good.

@nikic
Copy link
Contributor

nikic commented Oct 19, 2023

Also, any insight on how to make the equivalent change to the attributor?

Does it need any changes? I'd expect that Attributor uses CaptureTracking, which already handles this.

@krzysz00 krzysz00 force-pushed the fix-function-attrs-aliasing-intrinsics branch 2 times, most recently from 649da84 to 6ada751 Compare October 23, 2023 15:15
@nikic
Copy link
Contributor

nikic commented Oct 23, 2023

Looks like this needs a rebase.

@krzysz00 krzysz00 force-pushed the fix-function-attrs-aliasing-intrinsics branch from 6ada751 to df34967 Compare October 23, 2023 16:22
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -654,7 +654,15 @@ determinePointerAccessAttrs(Argument *A,
// must be a data operand (e.g. argument or operand bundle)
const unsigned UseIndex = CB.getDataOperandNo(U);

if (!CB.doesNotCapture(UseIndex)) {
// Some intrinsics (for instance ptrmask) do not capture their results,
// but return results thas alias their pointer argument, and thus should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// but return results thas alias their pointer argument, and thus should
// but return results that alias their pointer argument, and thus should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Apparently I forgot to git commit --amend in the typo fix. Should I make a new PR for that or?

Fixes llvm#68270

The function attribute analysis handles many instructions, like
addrspacecast, which do not themselves read or write memory but which
transform pointers into other values in the same alias set.

There are intrinsic functions, such as ptrmask or the AMDGPU-specific
make.buffer.rsrc, which also preserve membership in alias sets without
capturing. This commit adds the addrspacecast-like behavior to these
calls.
@krzysz00 krzysz00 force-pushed the fix-function-attrs-aliasing-intrinsics branch from df34967 to 836a421 Compare October 24, 2023 14:26
@krzysz00 krzysz00 merged commit 93f8e52 into llvm:main Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inferring function argument attributes incorrectly ignoes (sets readnone) intrinsics that return aliasing pointers
3 participants