[SLP] Remove -slp-optimize-identity-hor-reduction-ops option #106238

preames · 2024-08-27T15:49:55Z

This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.

This code has been unchanged for two years; let's simplify the code and remove configurability which makes the code harder to follow.

llvmbot · 2024-08-27T15:50:24Z

@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-llvm-transforms

Author: Philip Reames (preames)

Changes

This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.

Full diff: https://github.com/llvm/llvm-project/pull/106238.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+8-18)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll (-12)
(removed) llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll (-16)

diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index ed47ed661ab946..1c57855b57149c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -136,13 +136,6 @@ static cl::opt<bool> ShouldStartVectorizeHorAtStore(
     cl::desc(
         "Attempt to vectorize horizontal reductions feeding into a store"));
 
-// NOTE: If AllowHorRdxIdenityOptimization is true, the optimization will run
-// even if we match a reduction but do not vectorize in the end.
-static cl::opt<bool> AllowHorRdxIdenityOptimization(
-    "slp-optimize-identity-hor-reduction-ops", cl::init(true), cl::Hidden,
-    cl::desc("Allow optimization of original scalar identity operations on "
-             "matched horizontal reductions."));
-
 static cl::opt<int>
 MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,
     cl::desc("Attempt to vectorize for this register size in bits"));
@@ -17565,10 +17558,9 @@ class HorizontalReduction {
                           return Num + Vals.size();
                         });
     if (NumReducedVals < ReductionLimit &&
-        (!AllowHorRdxIdenityOptimization ||
-         all_of(ReducedVals, [](ArrayRef<Value *> RedV) {
-           return RedV.size() < 2 || !allConstant(RedV) || !isSplat(RedV);
-         }))) {
+        all_of(ReducedVals, [](ArrayRef<Value *> RedV) {
+          return RedV.size() < 2 || !allConstant(RedV) || !isSplat(RedV);
+        })) {
       for (ReductionOpsType &RdxOps : ReductionOps)
         for (Value *RdxOp : RdxOps)
           V.analyzedReductionRoot(cast<Instruction>(RdxOp));
@@ -17698,8 +17690,7 @@ class HorizontalReduction {
       }
 
       // Emit code for constant values.
-      if (AllowHorRdxIdenityOptimization && Candidates.size() > 1 &&
-          allConstant(Candidates)) {
+      if (Candidates.size() > 1 && allConstant(Candidates)) {
         Value *Res = Candidates.front();
         ++VectorizedVals.try_emplace(Candidates.front(), 0).first->getSecond();
         for (Value *VC : ArrayRef(Candidates).drop_front()) {
@@ -17714,15 +17705,14 @@ class HorizontalReduction {
 
       unsigned NumReducedVals = Candidates.size();
       if (NumReducedVals < ReductionLimit &&
-          (NumReducedVals < 2 || !AllowHorRdxIdenityOptimization ||
-           !isSplat(Candidates)))
+          (NumReducedVals < 2 || !isSplat(Candidates)))
         continue;
 
       // Check if we support repeated scalar values processing (optimization of
       // original scalar identity operations on matched horizontal reductions).
-      IsSupportedHorRdxIdentityOp =
-          AllowHorRdxIdenityOptimization && RdxKind != RecurKind::Mul &&
-          RdxKind != RecurKind::FMul && RdxKind != RecurKind::FMulAdd;
+      IsSupportedHorRdxIdentityOp = RdxKind != RecurKind::Mul &&
+                                    RdxKind != RecurKind::FMul &&
+                                    RdxKind != RecurKind::FMulAdd;
       // Gather same values.
       MapVector<Value *, unsigned> SameValuesCounter;
       if (IsSupportedHorRdxIdentityOp)
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll
index 2c417804c83e0d..bbc2bfdcb6c160 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll
@@ -1,6 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx | FileCheck %s
-; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx -slp-optimize-identity-hor-reduction-ops=false | FileCheck %s --check-prefix=NO-IDENTITY
 
 define i8 @test() {
 ; CHECK-LABEL: @test(
@@ -12,17 +11,6 @@ define i8 @test() {
 ; CHECK-NEXT:    [[TMP0]] = mul i32 [[CALL278]], 8
 ; CHECK-NEXT:    br label [[FOR_BODY]]
 ;
-; NO-IDENTITY-LABEL: @test(
-; NO-IDENTITY-NEXT:  entry:
-; NO-IDENTITY-NEXT:    br label [[FOR_BODY:%.*]]
-; NO-IDENTITY:       for.body:
-; NO-IDENTITY-NEXT:    [[SUM:%.*]] = phi i32 [ [[TMP2:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
-; NO-IDENTITY-NEXT:    [[CALL278:%.*]] = call i32 @fn(i32 [[SUM]])
-; NO-IDENTITY-NEXT:    [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[CALL278]], i32 0
-; NO-IDENTITY-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
-; NO-IDENTITY-NEXT:    [[TMP2]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP1]])
-; NO-IDENTITY-NEXT:    br label [[FOR_BODY]]
-;
 entry:
   br label %for.body
 
diff --git a/llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll b/llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll
deleted file mode 100644
index 6524b378f3d8bb..00000000000000
--- a/llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -passes=slp-vectorizer -S -slp-optimize-identity-hor-reduction-ops=false < %s -mtriple=s390x-ibm-linux -mcpu=arch13 | FileCheck %s
-
-define void @foo() {
-; CHECK-LABEL: define void @foo(
-; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)
-; CHECK-NEXT:    store i32 [[TMP1]], ptr null, align 4
-; CHECK-NEXT:    ret void
-;
-  %1 = add i32 0, 0
-  %2 = add i32 %1, 0
-  %3 = add i32 %2, 0
-  store i32 %3, ptr null, align 4
-  ret void
-}

alexey-bataev · 2024-08-27T16:01:08Z

This option was requested by Intel engs before, not sure they are ready to remove it.

preames · 2024-08-27T16:08:27Z

This option was requested by Intel engs before, not sure they are ready to remove it.

In general, as a project, we actively do not support downstream forks with otherwise unneeded upstream code. Unless there's a clear problem expressed, I'd still like to go ahead and remove the code. They can apply a downstream patch if they want, or report an issue against the upstream.

…ction-ops

llvm-ci · 2024-08-27T22:37:38Z

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-nfc running on bolt-worker while building llvm at step 8 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/5328

Here is the relevant piece of the build log for the reference

Step 8 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT :: perf2bolt/perf_test.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 5: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
RUN: at line 6: perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 (9 samples) ]
RUN: at line 7: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test:10:12: error: CHECK-NOT: excluded string found in input
CHECK-NOT: !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection.
           ^
<stdin>:27:2: note: found here
 !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance.
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        .
        .
        .
       22: BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. 
       23: PERF2BOLT: processing basic events (without LBR)... 
       24: PERF2BOLT: read 9 samples 
       25: PERF2BOLT: out of range samples recorded in unknown regions: 9 (100.0%) 
       26:  
       27:  !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance. 
not:10      !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                   error: no match expected
       28:  
       29: PERF2BOLT: wrote 0 objects and 0 memory objects to /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 
       30: BOLT-INFO: 0 out of 13 functions in the binary (0.0%) have non-empty execution profile 
>>>>>>

--

********************

preames added 2 commits August 27, 2024 08:34

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option

6746f6e

This code has been unchanged for two years; let's simplify the code and remove configurability which makes the code harder to follow.

clang-format

c387be4

preames requested a review from alexey-bataev August 27, 2024 15:49

llvmbot added backend:SystemZ vectorizers llvm:transforms labels Aug 27, 2024

alexey-bataev approved these changes Aug 27, 2024

View reviewed changes

Merge branch 'main' into pr-slp-remove-slp-optimize-identity-hor-redu…

14ef49e

…ction-ops

preames merged commit ee764a2 into llvm:main Aug 27, 2024
3 of 6 checks passed

preames deleted the pr-slp-remove-slp-optimize-identity-hor-reduction-ops branch August 27, 2024 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option #106238

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option #106238

Uh oh!

preames commented Aug 27, 2024

Uh oh!

llvmbot commented Aug 27, 2024 •

edited

Loading

Uh oh!

alexey-bataev commented Aug 27, 2024

Uh oh!

preames commented Aug 27, 2024

Uh oh!

Uh oh!

llvm-ci commented Aug 27, 2024

Uh oh!

Uh oh!

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option #106238

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option #106238

Uh oh!

Conversation

preames commented Aug 27, 2024

Uh oh!

llvmbot commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-bataev commented Aug 27, 2024

Uh oh!

preames commented Aug 27, 2024

Uh oh!

Uh oh!

llvm-ci commented Aug 27, 2024

Uh oh!

Uh oh!

llvmbot commented Aug 27, 2024 •

edited

Loading