-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Don't tail call memset if it would convert to a bzero. #98969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Well, not quite that simple. We can tc memset since it returns the first argument but bzero doesn't do that and therefore we can end up miscompiling. rdar://131419786
@llvm/pr-subscribers-backend-systemz @llvm/pr-subscribers-backend-aarch64 Author: Amara Emerson (aemerson) ChangesWell, not quite that simple. We can tc memset since it returns the first rdar://131419786 Full diff: https://github.com/llvm/llvm-project/pull/98969.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/Analysis.cpp b/llvm/lib/CodeGen/Analysis.cpp
index 7fc18639e5852..2a3015866da5f 100644
--- a/llvm/lib/CodeGen/Analysis.cpp
+++ b/llvm/lib/CodeGen/Analysis.cpp
@@ -677,6 +677,8 @@ bool llvm::returnTypeIsEligibleForTailCall(const Function *F,
// will be expanded as memcpy in libc, which returns the first
// argument. On other platforms like arm-none-eabi, memcpy may be
// expanded as library call without return value, like __aeabi_memcpy.
+ // Similarly, llvm.memset can be expanded to bzero, which doesn't have a
+ // return value either.
const CallInst *Call = cast<CallInst>(I);
if (Function *F = Call->getCalledFunction()) {
Intrinsic::ID IID = F->getIntrinsicID();
@@ -685,7 +687,10 @@ bool llvm::returnTypeIsEligibleForTailCall(const Function *F,
(IID == Intrinsic::memmove &&
TLI.getLibcallName(RTLIB::MEMMOVE) == StringRef("memmove")) ||
(IID == Intrinsic::memset &&
- TLI.getLibcallName(RTLIB::MEMSET) == StringRef("memset"))) &&
+ TLI.getLibcallName(RTLIB::MEMSET) == StringRef("memset") &&
+ (!isa<ConstantInt>(Call->getOperand(1)) ||
+ !cast<ConstantInt>(Call->getOperand(1))->isZero() ||
+ !TLI.getLibcallName(RTLIB::BZERO)))) &&
(RetVal == Call->getArgOperand(0) ||
isPointerBitcastEqualTo(RetVal, Call->getArgOperand(0))))
return true;
diff --git a/llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll b/llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll
new file mode 100644
index 0000000000000..90e641cd4fe3d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll
@@ -0,0 +1,20 @@
+; RUN: llc -o - %s | FileCheck %s
+; RUN: llc -global-isel -o - %s | FileCheck %s
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx15.0.0"
+
+define ptr @test() {
+; CHECK-LABEL: test:
+; CHECK-NOT: b _bzero
+ %1 = tail call ptr @fn(i32 noundef 1) #3
+ tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(1000) %1, i8 noundef 0, i64 noundef 1000, i1 noundef false) #3
+ ret ptr %1
+}
+
+declare ptr @fn(i32 noundef)
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #2
+
+attributes #2 = { nocallback nofree nounwind willreturn memory(argmem: write) }
+attributes #3 = { nounwind optsize }
|
|
||
define ptr @test() { | ||
; CHECK-LABEL: test: | ||
; CHECK-NOT: b _bzero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a positive check for "bl _bzero",
llvm/lib/CodeGen/Analysis.cpp
Outdated
TLI.getLibcallName(RTLIB::MEMSET) == StringRef("memset") && | ||
(!isa<ConstantInt>(Call->getOperand(1)) || | ||
!cast<ConstantInt>(Call->getOperand(1))->isZero() || | ||
!TLI.getLibcallName(RTLIB::BZERO)))) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of ugly... if this code doesn't precisely match SelectionDAG::getMemset, we miscompile.
Can we rearrange the code so that the caller of isInTailCallPosition() does this computation? That could be directly adjacent to the code in getMemset that actually makes the decision of whether to call memset or bzero. We then just pass that into isInTailCallPosition() as a argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not understanding what you're suggesting. isInTailCallPosition
is called in multiple places so adding in an extra parameter doesn't make it any cleaner IMO. getMemset() is called later, by that point we've already decided to tail-call. We could downgrade the tail call in getMemset() but then we also lose the tail-call when the parent function's return type was void or it returned undef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, isTailCall is a parameter to SelectionDAG::getMemset(), but it doesn't really need to be; SelectionDAGBuilder::visitIntrinsicCall just calls isInTailCallPosition so it can pass the result into getMemset(). Instead, we can just call isInTailCallPosition from getMemset(), once we know whether we're calling bzero or memset, and we can pass that information down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement this. It didn't turn out as well as I'd hoped, memset is ok to do and we can move the logic for it. However for memmove we have a user in XCore which is computing a tail-call-position using a DAG node without any reference to a CallInst
so we can't do the same thing there. From what I could tell memcpy usages seemed ok to convert too but I didn't check all of them since there were so many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the way calls are lowered, we have two different tail-call-position analysis passes, yes: one for regular calls, which are lowered as part of building the SelectionDAG, and one for library calls which are created as part of legalization/optimization/etc.
We could teach getMemmove() etc. to call the correct isInTailCallPosition() based on the arguments: if the caller passed in an Instruction, use the IR version, if the caller passed in an SDNode, use the SelectionDAG version. Or we could try to use the SelectionDAG version exclusively: add an ISD::MEMMOVE node, and lower it after SelectionDAGBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the IR version tries to do the returns-first-arg trickery so we can probably just go with the first option.
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm/lib/CodeGen/Analysis.cpp
Outdated
Value *RetVal = Ret ? Ret->getReturnValue() : nullptr; | ||
bool ReturnsFirstArg = false; | ||
if (RetVal && ((RetVal == CI.getArgOperand(0) || | ||
isPointerBitcastEqualTo(RetVal, CI.getArgOperand(0))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember you mentioned the ret ptr undef
case yesterday. Worth accounting for that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are covered by existing tests. The call to isInTailCallPosition()
will check that and early exit true if so.
llvm/include/llvm/CodeGen/Analysis.h
Outdated
|
||
/// Check whether B is a bitcast of a pointer type to another pointer type, | ||
/// which is equal to A. | ||
bool isPointerBitcastEqualTo(const Value *A, const Value *B); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is isPointerBitcastEqualTo
really necessary? With opaque pointer types, we should never see a bitcast like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess not.
I converted the other mem routines so we can completely remove the logic in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -148,10 +148,6 @@ bool returnTypeIsEligibleForTailCall(const Function *F, const Instruction *I, | |||
const TargetLoweringBase &TLI, | |||
bool ReturnsFirstArg = false); | |||
|
|||
/// Check whether B is a bitcast of a pointer type to another pointer type, | |||
/// which is equal to A. | |||
bool isPointerBitcastEqualTo(const Value *A, const Value *B); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this is an unrelated patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eli commented in another part of the patch that given we have opaque pointers now this isn't needed anymore.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/95/builds/1398 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/176/builds/1177 Here is the relevant piece of the build log for the reference:
|
I had to push 3941f65 to unbreak the m86k backend |
and xtensa: 64f67a4 |
) Summary: Well, not quite that simple. We can tc memset since it returns the first argument but bzero doesn't do that and therefore we can end up miscompiling. This patch also refactors the logic out of isInTailCallPosition() into the callers. As a result memcpy and memmove are also modified to do the same thing for consistency. rdar://131419786 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250902
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250951
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up miscompiling.
rdar://131419786