Skip to content

[AArch64] Don't tail call memset if it would convert to a bzero. #98969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 17, 2024

Conversation

aemerson
Copy link
Contributor

Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up miscompiling.

rdar://131419786

Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up miscompiling.

rdar://131419786
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2024

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-msp430
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-hexagon
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Amara Emerson (aemerson)

Changes

Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up miscompiling.

rdar://131419786


Full diff: https://github.com/llvm/llvm-project/pull/98969.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/Analysis.cpp (+6-1)
  • (added) llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll (+20)
diff --git a/llvm/lib/CodeGen/Analysis.cpp b/llvm/lib/CodeGen/Analysis.cpp
index 7fc18639e5852..2a3015866da5f 100644
--- a/llvm/lib/CodeGen/Analysis.cpp
+++ b/llvm/lib/CodeGen/Analysis.cpp
@@ -677,6 +677,8 @@ bool llvm::returnTypeIsEligibleForTailCall(const Function *F,
   // will be expanded as memcpy in libc, which returns the first
   // argument. On other platforms like arm-none-eabi, memcpy may be
   // expanded as library call without return value, like __aeabi_memcpy.
+  // Similarly, llvm.memset can be expanded to bzero, which doesn't have a
+  // return value either.
   const CallInst *Call = cast<CallInst>(I);
   if (Function *F = Call->getCalledFunction()) {
     Intrinsic::ID IID = F->getIntrinsicID();
@@ -685,7 +687,10 @@ bool llvm::returnTypeIsEligibleForTailCall(const Function *F,
          (IID == Intrinsic::memmove &&
           TLI.getLibcallName(RTLIB::MEMMOVE) == StringRef("memmove")) ||
          (IID == Intrinsic::memset &&
-          TLI.getLibcallName(RTLIB::MEMSET) == StringRef("memset"))) &&
+          TLI.getLibcallName(RTLIB::MEMSET) == StringRef("memset") &&
+          (!isa<ConstantInt>(Call->getOperand(1)) ||
+           !cast<ConstantInt>(Call->getOperand(1))->isZero() ||
+           !TLI.getLibcallName(RTLIB::BZERO)))) &&
         (RetVal == Call->getArgOperand(0) ||
          isPointerBitcastEqualTo(RetVal, Call->getArgOperand(0))))
       return true;
diff --git a/llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll b/llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll
new file mode 100644
index 0000000000000..90e641cd4fe3d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/no-tail-call-bzero-from-memset.ll
@@ -0,0 +1,20 @@
+; RUN: llc -o - %s | FileCheck %s
+; RUN: llc -global-isel -o - %s | FileCheck %s
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx15.0.0"
+
+define ptr @test()  {
+; CHECK-LABEL: test:
+; CHECK-NOT: b _bzero
+  %1 = tail call ptr @fn(i32 noundef 1) #3
+  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(1000) %1, i8 noundef 0, i64 noundef 1000, i1 noundef false) #3
+  ret ptr %1
+}
+
+declare ptr @fn(i32 noundef)
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
+declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #2
+
+attributes #2 = { nocallback nofree nounwind willreturn memory(argmem: write) }
+attributes #3 = { nounwind optsize }


define ptr @test() {
; CHECK-LABEL: test:
; CHECK-NOT: b _bzero
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a positive check for "bl _bzero",

TLI.getLibcallName(RTLIB::MEMSET) == StringRef("memset") &&
(!isa<ConstantInt>(Call->getOperand(1)) ||
!cast<ConstantInt>(Call->getOperand(1))->isZero() ||
!TLI.getLibcallName(RTLIB::BZERO)))) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of ugly... if this code doesn't precisely match SelectionDAG::getMemset, we miscompile.

Can we rearrange the code so that the caller of isInTailCallPosition() does this computation? That could be directly adjacent to the code in getMemset that actually makes the decision of whether to call memset or bzero. We then just pass that into isInTailCallPosition() as a argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not understanding what you're suggesting. isInTailCallPosition is called in multiple places so adding in an extra parameter doesn't make it any cleaner IMO. getMemset() is called later, by that point we've already decided to tail-call. We could downgrade the tail call in getMemset() but then we also lose the tail-call when the parent function's return type was void or it returned undef.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, isTailCall is a parameter to SelectionDAG::getMemset(), but it doesn't really need to be; SelectionDAGBuilder::visitIntrinsicCall just calls isInTailCallPosition so it can pass the result into getMemset(). Instead, we can just call isInTailCallPosition from getMemset(), once we know whether we're calling bzero or memset, and we can pass that information down.

Copy link
Contributor Author

@aemerson aemerson Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this. It didn't turn out as well as I'd hoped, memset is ok to do and we can move the logic for it. However for memmove we have a user in XCore which is computing a tail-call-position using a DAG node without any reference to a CallInst so we can't do the same thing there. From what I could tell memcpy usages seemed ok to convert too but I didn't check all of them since there were so many.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the way calls are lowered, we have two different tail-call-position analysis passes, yes: one for regular calls, which are lowered as part of building the SelectionDAG, and one for library calls which are created as part of legalization/optimization/etc.

We could teach getMemmove() etc. to call the correct isInTailCallPosition() based on the arguments: if the caller passed in an Instruction, use the IR version, if the caller passed in an SDNode, use the SelectionDAG version. Or we could try to use the SelectionDAG version exclusively: add an ISD::MEMMOVE node, and lower it after SelectionDAGBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the IR version tries to do the returns-first-arg trickery so we can probably just go with the first option.

@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Jul 16, 2024
Copy link

github-actions bot commented Jul 16, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Value *RetVal = Ret ? Ret->getReturnValue() : nullptr;
bool ReturnsFirstArg = false;
if (RetVal && ((RetVal == CI.getArgOperand(0) ||
isPointerBitcastEqualTo(RetVal, CI.getArgOperand(0)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you mentioned the ret ptr undef case yesterday. Worth accounting for that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are covered by existing tests. The call to isInTailCallPosition() will check that and early exit true if so.


/// Check whether B is a bitcast of a pointer type to another pointer type,
/// which is equal to A.
bool isPointerBitcastEqualTo(const Value *A, const Value *B);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is isPointerBitcastEqualTo really necessary? With opaque pointer types, we should never see a bitcast like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess not.

@aemerson
Copy link
Contributor Author

I converted the other mem routines so we can completely remove the logic in returnTypeIsEligibleForTailCall()

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -148,10 +148,6 @@ bool returnTypeIsEligibleForTailCall(const Function *F, const Instruction *I,
const TargetLoweringBase &TLI,
bool ReturnsFirstArg = false);

/// Check whether B is a bitcast of a pointer type to another pointer type,
/// which is equal to A.
bool isPointerBitcastEqualTo(const Value *A, const Value *B);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this is an unrelated patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eli commented in another part of the patch that given we have opaque pointers now this isn't needed anymore.

@aemerson aemerson merged commit f270a4d into llvm:main Jul 17, 2024
14 of 15 checks passed
@aemerson aemerson deleted the bzero-tailcall branch July 17, 2024 08:31
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 17, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-test-suite running on ppc64le-clang-test-suite while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/95/builds/1398

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'SanitizerCommon-lsan-powerpc64le-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=leak  -m64 -fno-function-sections -funwind-tables  -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64le-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -fno-function-sections -funwind-tables -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64le-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
RUN: at line 5: env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64le-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
+ env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64le-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:68:24: error: CHECK_MAY_RETURN_1: expected string not found in input
// CHECK_MAY_RETURN_1: allocating 512 times
                       ^
<stdin>:54:44: note: scanning from here
Some of the malloc calls returned non-null: 256
                                           ^
<stdin>:55:14: note: possible intended match here
==2666978==LeakSanitizer: soft rss limit unexhausted (220Mb vs 8Mb)
             ^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           49:  [256] 
           50:  [320] 
           51:  [384] 
           52:  [448] 
           53: Some of the malloc calls returned null: 256 
           54: Some of the malloc calls returned non-null: 256 
check:68'0                                                X~~~~ error: no match found
           55: ==2666978==LeakSanitizer: soft rss limit unexhausted (220Mb vs 8Mb) 
check:68'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:68'1                  ?                                                       possible intended match
>>>>>>

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 17, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64be-linux-test-suite running on ppc64be-clang-test-suite while building llvm at step 7 "test-build-unified-tree-check-runtimes".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/176/builds/1177

Here is the relevant piece of the build log for the reference:

Step 7 (test-build-unified-tree-check-runtimes) failure: test (failure)
******************** TEST 'SanitizerCommon-lsan-powerpc64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=leak  -m64 -funwind-tables  -I/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -funwind-tables -I/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
RUN: at line 5: env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
+ env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:68:24: error: CHECK_MAY_RETURN_1: expected string not found in input
// CHECK_MAY_RETURN_1: allocating 512 times
                       ^
<stdin>:54:44: note: scanning from here
Some of the malloc calls returned non-null: 256
                                           ^
<stdin>:55:13: note: possible intended match here
==153319==LeakSanitizer: soft rss limit unexhausted (220Mb vs 37Mb)
            ^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           49:  [256] 
           50:  [320] 
           51:  [384] 
           52:  [448] 
           53: Some of the malloc calls returned null: 256 
           54: Some of the malloc calls returned non-null: 256 
check:68'0                                                X~~~~ error: no match found
           55: ==153319==LeakSanitizer: soft rss limit unexhausted (220Mb vs 37Mb) 
check:68'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:68'1                 ?                                                        possible intended match
>>>>>>

--

********************


@sylvestre
Copy link
Collaborator

I had to push 3941f65 to unbreak the m86k backend

@sylvestre
Copy link
Collaborator

and xtensa: 64f67a4

MaskRay added a commit that referenced this pull request Jul 17, 2024
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
)

Summary:
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.

This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.

rdar://131419786

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250902
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary: 

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250951
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants