Skip to content

[CVP][LVI] Add support for InsertElementInst in LVI #99368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

rajatbajpai
Copy link
Contributor

Currently, the LVI analysis pass doesn't support InsertElementInst vector instruction. Due to this, some optimization opportunities are missed. For example, in the below example, ICMP instruction can be folded but it doesn't.

...
%ie1 = insertelement <2 x i32> poison, i32 10, i64 0
%ie2 = insertelement <2 x i32> %ie1, i32 20, i64 1
%icmp = icmp <2 x i1> %ie2, <i32 40, i32 40>
...

This change adds InsertElementInst support in the LVI analysis pass to fix the motivating example.

@rajatbajpai rajatbajpai requested a review from nikic as a code owner July 17, 2024 18:34
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Rajat Bajpai (rajatbajpai)

Changes

Currently, the LVI analysis pass doesn't support InsertElementInst vector instruction. Due to this, some optimization opportunities are missed. For example, in the below example, ICMP instruction can be folded but it doesn't.

...
%ie1 = insertelement &lt;2 x i32&gt; poison, i32 10, i64 0
%ie2 = insertelement &lt;2 x i32&gt; %ie1, i32 20, i64 1
%icmp = icmp &lt;2 x i1&gt; %ie2, &lt;i32 40, i32 40&gt;
...

This change adds InsertElementInst support in the LVI analysis pass to fix the motivating example.


Full diff: https://github.com/llvm/llvm-project/pull/99368.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/LazyValueInfo.cpp (+36)
  • (added) llvm/test/Transforms/CorrelatedValuePropagation/insertelement.ll (+76)
diff --git a/llvm/lib/Analysis/LazyValueInfo.cpp b/llvm/lib/Analysis/LazyValueInfo.cpp
index 92389f2896b8e..d28d4fa47fdae 100644
--- a/llvm/lib/Analysis/LazyValueInfo.cpp
+++ b/llvm/lib/Analysis/LazyValueInfo.cpp
@@ -428,6 +428,8 @@ class LazyValueInfoImpl {
   std::optional<ValueLatticeElement> solveBlockValueIntrinsic(IntrinsicInst *II,
                                                               BasicBlock *BB);
   std::optional<ValueLatticeElement>
+  solveBlockValueInsertElement(InsertElementInst *IEI, BasicBlock *BB);
+  std::optional<ValueLatticeElement>
   solveBlockValueExtractValue(ExtractValueInst *EVI, BasicBlock *BB);
   bool isNonNullAtEndOfBlock(Value *Val, BasicBlock *BB);
   void intersectAssumeOrGuardBlockValueConstantRange(Value *Val,
@@ -657,6 +659,9 @@ LazyValueInfoImpl::solveBlockValueImpl(Value *Val, BasicBlock *BB) {
     if (BinaryOperator *BO = dyn_cast<BinaryOperator>(BBI))
       return solveBlockValueBinaryOp(BO, BB);
 
+    if (auto *IEI = dyn_cast<InsertElementInst>(BBI))
+      return solveBlockValueInsertElement(IEI, BB);
+
     if (auto *EVI = dyn_cast<ExtractValueInst>(BBI))
       return solveBlockValueExtractValue(EVI, BB);
 
@@ -1038,6 +1043,37 @@ LazyValueInfoImpl::solveBlockValueIntrinsic(IntrinsicInst *II, BasicBlock *BB) {
                    MetadataVal);
 }
 
+std::optional<ValueLatticeElement>
+LazyValueInfoImpl::solveBlockValueInsertElement(InsertElementInst *IEI,
+                                                    BasicBlock *BB) {
+  std::optional<ValueLatticeElement> OptEltVal =
+      getBlockValue(IEI->getOperand(1), BB, IEI);
+  if (!OptEltVal)
+    return std::nullopt;
+  ValueLatticeElement &EltVal = *OptEltVal;
+
+  if (auto *CV = dyn_cast<ConstantVector>(IEI->getOperand(0))) {
+    // Must be vector of integers. Merge these elements to create
+    // the range.
+    for (unsigned i = 0, e = CV->getNumOperands(); i != e; ++i) {
+      Constant *Elem = CV->getAggregateElement(i);
+      if (isa<PoisonValue>(Elem))
+        continue;
+      std::optional<ConstantRange> CR = getRangeFor(Elem, IEI, BB);
+      if (!CR)
+        return std::nullopt;
+      EltVal.mergeIn(ValueLatticeElement::getRange(*CR));
+    }
+  } else if (!isa<PoisonValue>(IEI->getOperand(0))) {
+    std::optional<ValueLatticeElement> OptVecResult =
+        solveBlockValueImpl(IEI->getOperand(0), BB);
+    if (!OptVecResult)
+      return std::nullopt;
+    EltVal.mergeIn(*OptVecResult);
+  }
+  return EltVal;
+}
+
 std::optional<ValueLatticeElement>
 LazyValueInfoImpl::solveBlockValueExtractValue(ExtractValueInst *EVI,
                                                BasicBlock *BB) {
diff --git a/llvm/test/Transforms/CorrelatedValuePropagation/insertelement.ll b/llvm/test/Transforms/CorrelatedValuePropagation/insertelement.ll
new file mode 100644
index 0000000000000..769f431738342
--- /dev/null
+++ b/llvm/test/Transforms/CorrelatedValuePropagation/insertelement.ll
@@ -0,0 +1,76 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=correlated-propagation -S | FileCheck %s
+
+;; Check if ICMP instruction is constant folded or not.
+
+define void @test1(ptr addrspace(1) %out) {
+; CHECK-LABEL: define void @test1(
+; CHECK-SAME: ptr addrspace(1) [[OUT:%.*]]) {
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:    [[UDIV_LHS_TRUNC:%.*]] = trunc i32 [[CALL]] to i16
+; CHECK-NEXT:    [[UDIV1:%.*]] = udiv i16 [[UDIV_LHS_TRUNC]], 5
+; CHECK-NEXT:    [[UDIV_ZEXT:%.*]] = zext i16 [[UDIV1]] to i32
+; CHECK-NEXT:    [[ADD1:%.*]] = add nuw nsw i32 [[UDIV_ZEXT]], 768
+; CHECK-NEXT:    [[ADD2:%.*]] = add nuw nsw i32 [[UDIV_ZEXT]], 896
+; CHECK-NEXT:    [[IE1:%.*]] = insertelement <2 x i32> poison, i32 [[ADD1]], i64 0
+; CHECK-NEXT:    [[IE2:%.*]] = insertelement <2 x i32> [[IE1]], i32 [[ADD2]], i64 1
+; CHECK-NEXT:    [[EI1:%.*]] = extractelement <2 x i1> <i1 true, i1 true>, i64 0
+; CHECK-NEXT:    [[EI2:%.*]] = extractelement <2 x i1> <i1 true, i1 true>, i64 1
+; CHECK-NEXT:    [[ADDUP:%.*]] = add i1 [[EI1]], [[EI2]]
+; CHECK-NEXT:    [[ADDUP_UPCAST:%.*]] = zext i1 [[ADDUP]] to i32
+; CHECK-NEXT:    store i32 [[ADDUP_UPCAST]], ptr addrspace(1) [[OUT]], align 4
+; CHECK-NEXT:    ret void
+;
+  %call = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range !1
+  %udiv = udiv i32 %call, 5
+  %add1 = add i32 %udiv, 768
+  %add2 = add i32 %udiv, 896
+  %ie1 = insertelement <2 x i32> poison, i32 %add1, i64 0
+  %ie2 = insertelement <2 x i32> %ie1, i32 %add2, i64 1
+  %icmp1 = icmp slt <2 x i32> %ie2, <i32 1024, i32 1024>
+  %ei1 = extractelement <2 x i1> %icmp1, i64 0
+  %ei2 = extractelement <2 x i1> %icmp1, i64 1
+  %addUp = add i1 %ei1, %ei2
+  %addUp.upcast = zext i1 %addUp to i32
+  store i32 %addUp.upcast, ptr addrspace(1) %out, align 4
+  ret void
+}
+
+
+;; Check if LVI is able to handle constant vector operands
+;; in InsertElementInst and CVP is able to fold ICMP instruction.
+
+define void @test2(ptr addrspace(1) %out) {
+; CHECK-LABEL: define void @test2(
+; CHECK-SAME: ptr addrspace(1) [[OUT:%.*]]) {
+; CHECK-NEXT:    [[CALL:%.*]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range [[RNG0]]
+; CHECK-NEXT:    [[UDIV_LHS_TRUNC:%.*]] = trunc i32 [[CALL]] to i16
+; CHECK-NEXT:    [[UDIV1:%.*]] = udiv i16 [[UDIV_LHS_TRUNC]], 5
+; CHECK-NEXT:    [[UDIV_ZEXT:%.*]] = zext i16 [[UDIV1]] to i32
+; CHECK-NEXT:    [[ADD2:%.*]] = add nuw nsw i32 [[UDIV_ZEXT]], 896
+; CHECK-NEXT:    [[IE1:%.*]] = insertelement <2 x i32> <i32 poison, i32 1>, i32 [[ADD2]], i64 0
+; CHECK-NEXT:    [[EI1:%.*]] = extractelement <2 x i1> <i1 true, i1 true>, i64 0
+; CHECK-NEXT:    [[EI2:%.*]] = extractelement <2 x i1> <i1 true, i1 true>, i64 1
+; CHECK-NEXT:    [[ADDUP:%.*]] = add i1 [[EI1]], [[EI2]]
+; CHECK-NEXT:    [[ADDUP_UPCAST:%.*]] = zext i1 [[ADDUP]] to i32
+; CHECK-NEXT:    store i32 [[ADDUP_UPCAST]], ptr addrspace(1) [[OUT]], align 4
+; CHECK-NEXT:    ret void
+;
+  %call = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range !1
+  %udiv = udiv i32 %call, 5
+  %add2 = add i32 %udiv, 896
+  %ie1 = insertelement <2 x i32> <i32 poison, i32 1>, i32 %add2, i64 0
+  %icmp1 = icmp slt <2 x i32> %ie1, <i32 1024, i32 1024>
+  %ei1 = extractelement <2 x i1> %icmp1, i64 0
+  %ei2 = extractelement <2 x i1> %icmp1, i64 1
+  %addUp = add i1 %ei1, %ei2
+  %addUp.upcast = zext i1 %addUp to i32
+  store i32 %addUp.upcast, ptr addrspace(1) %out, align 4
+  ret void
+}
+
+
+!1 = !{i32 0, i32 640}
+;.
+; CHECK: [[RNG0]] = !{i32 0, i32 640}
+;.

Copy link

github-actions bot commented Jul 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@rajatbajpai rajatbajpai force-pushed the dev/rbajpai/lvi_insertelement branch 2 times, most recently from 4e223c3 to 5e2a073 Compare July 18, 2024 16:37
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rajatbajpai rajatbajpai force-pushed the dev/rbajpai/lvi_insertelement branch from 5e2a073 to 0dfc0f9 Compare July 18, 2024 17:04
@rajatbajpai
Copy link
Contributor Author

Will merge the changes after getting the clean pipeline. Thank you for reviewing this change.

@rajatbajpai rajatbajpai force-pushed the dev/rbajpai/lvi_insertelement branch from 0dfc0f9 to e6145b4 Compare July 19, 2024 04:57
Currently, the LVI analysis pass doesn't support InsertElementInst vector
instruction. Due to this, some optimization opportunities are missed.
For example, in the below example, ICMP instruction can be folded but it doesn't.

```
...
%ie1 = insertelement <2 x i32> poison, i32 10, i64 0
%ie2 = insertelement <2 x i32> %ie1, i32 20, i64 1
%icmp1 = icmp <2 x i1> %ie2, <i32 40, i32 40>
...
```

This change adds InsertElementInst support in the LVI analysis pass to fix the
motivating example.
@rajatbajpai rajatbajpai force-pushed the dev/rbajpai/lvi_insertelement branch from e6145b4 to 530f0c6 Compare July 19, 2024 05:48
@rajatbajpai
Copy link
Contributor Author

The Linux build is failing and the error doesn't seem to be related to my change. I'll wait some time for the setup to recover.

@nikic
Copy link
Contributor

nikic commented Jul 19, 2024

@rajatbajpai You can ignore the linux build.

@rajatbajpai
Copy link
Contributor Author

@nikic could you please merge this PR for me? I don't have permission to merge a PR. Thanks!

@nikic nikic merged commit 90668d2 into llvm:main Jul 19, 2024
5 of 7 checks passed
@rajatbajpai rajatbajpai deleted the dev/rbajpai/lvi_insertelement branch July 19, 2024 06:49
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 19, 2024

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/2815

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/AArch64/ELF_relocations.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: rm -rf /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp && mkdir -p /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp
+ rm -rf /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp
+ mkdir -p /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp
RUN: at line 2: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llvm-mc -triple=aarch64-unknown-linux-gnu -x86-relax-relocations=false    -position-independent -filetype=obj -o /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llvm-mc -triple=aarch64-unknown-linux-gnu -x86-relax-relocations=false -position-independent -filetype=obj -o /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s
RUN: at line 4: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llvm-jitlink -noexec               -abs external_data=0xdeadbeef               -abs external_func=0xcafef00d               -check /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llvm-jitlink -noexec -abs external_data=0xdeadbeef -abs external_func=0xcafef00d -check /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s /b/1/clang-x86_64-debian-fast/llvm.obj/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o
Expression 'decode_operand(test_adr_gotpage_external, 1) =      (got_addr(elf_reloc.o, external_data)[32:12] -         test_adr_gotpage_external[32:12])' is false: 0x108 != 0xffffffffffe00108
/b/1/clang-x86_64-debian-fast/llvm.obj/bin/llvm-jitlink: Some checks in /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s failed

--

********************


yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
Currently, the LVI analysis pass doesn't support InsertElementInst
vector instruction. Due to this, some optimization opportunities are
missed. For example, in the below example, ICMP instruction can be
folded but it doesn't.

```
...
%ie1 = insertelement <2 x i32> poison, i32 10, i64 0
%ie2 = insertelement <2 x i32> %ie1, i32 20, i64 1
%icmp = icmp <2 x i1> %ie2, <i32 40, i32 40>
...
```

This change adds InsertElementInst support in the LVI analysis pass to
fix the motivating example.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251386
@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Dec 2, 2024

Hello,

I bisected a crash back to this patch:
opt -passes="correlated-propagation" bbi-101893.ll -o /dev/null -S
crashes with

opt: ../include/llvm/IR/Instructions.h:2647: void llvm::PHINode::setIncomingValue(unsigned int, Value *): Assertion `getType() == V->getType() && "All operands to PHI node must be the same type as the PHI node!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=correlated-propagation bbi-101893.ll -o /dev/null -S
1.	Running pass "function(correlated-propagation)" on module "bbi-101893.ll"
2.	Running pass "correlated-propagation" on function "func_18"
 #0 0x0000562b7e9082a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x444a2a6)
 #1 0x0000562b7e905cce llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x4447cce)
 #2 0x0000562b7e908b59 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fa6b79bbcf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007fa6b5574acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007fa6b5547ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007fa6b5547d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007fa6b556d426 (/lib64/libc.so.6+0x47426)
 #8 0x0000562b8007d234 runImpl(llvm::Function&, llvm::LazyValueInfo*, llvm::DominatorTree*, llvm::SimplifyQuery const&) CorrelatedValuePropagation.cpp:0:0
 #9 0x0000562b80077e54 llvm::CorrelatedValuePropagationPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5bb9e54)
#10 0x0000562b7fd25b5d llvm::detail::PassModel<llvm::Function, llvm::CorrelatedValuePropagationPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#11 0x0000562b7eb25057 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x4667057)
#12 0x0000562b7fd28a6d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#13 0x0000562b7eb29cbb llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x466bcbb)
#14 0x0000562b7fd21fdd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#15 0x0000562b7eb23ce7 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x4665ce7)
#16 0x0000562b7fcc211c llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x580411c)
#17 0x0000562b7e8cb60a optMain (build-all/bin/opt+0x440d60a)
#18 0x00007fa6b5560d85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#19 0x0000562b7e8c91ee _start (build-all/bin/opt+0x440b1ee)
Abort (core dumped)

bbi-101893.ll.gz

@nikic
Copy link
Contributor

nikic commented Dec 2, 2024

Reduced test case:

@g = external global i32

define <2 x i16> @test() {
  %ins = insertelement <2 x i16> poison, i16 ptrtoint (ptr @g to i16), i32 0
  ret <2 x i16> %ins
}

@nikic
Copy link
Contributor

nikic commented Dec 2, 2024

@mikaelholmen Should be fixed by 7a7a426.

@mikaelholmen
Copy link
Collaborator

That was quick. Thanks!

@rajatbajpai
Copy link
Contributor Author

Thanks @mikaelholmen, for reporting the issue and @nikic for a quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants