Skip to content

[SLP] Remove -slp-optimize-identity-hor-reduction-ops option #106238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

preames
Copy link
Collaborator

@preames preames commented Aug 27, 2024

This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.

This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.
@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2024

@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-llvm-transforms

Author: Philip Reames (preames)

Changes

This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.


Full diff: https://github.com/llvm/llvm-project/pull/106238.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+8-18)
  • (modified) llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll (-12)
  • (removed) llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll (-16)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index ed47ed661ab946..1c57855b57149c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -136,13 +136,6 @@ static cl::opt<bool> ShouldStartVectorizeHorAtStore(
     cl::desc(
         "Attempt to vectorize horizontal reductions feeding into a store"));
 
-// NOTE: If AllowHorRdxIdenityOptimization is true, the optimization will run
-// even if we match a reduction but do not vectorize in the end.
-static cl::opt<bool> AllowHorRdxIdenityOptimization(
-    "slp-optimize-identity-hor-reduction-ops", cl::init(true), cl::Hidden,
-    cl::desc("Allow optimization of original scalar identity operations on "
-             "matched horizontal reductions."));
-
 static cl::opt<int>
 MaxVectorRegSizeOption("slp-max-reg-size", cl::init(128), cl::Hidden,
     cl::desc("Attempt to vectorize for this register size in bits"));
@@ -17565,10 +17558,9 @@ class HorizontalReduction {
                           return Num + Vals.size();
                         });
     if (NumReducedVals < ReductionLimit &&
-        (!AllowHorRdxIdenityOptimization ||
-         all_of(ReducedVals, [](ArrayRef<Value *> RedV) {
-           return RedV.size() < 2 || !allConstant(RedV) || !isSplat(RedV);
-         }))) {
+        all_of(ReducedVals, [](ArrayRef<Value *> RedV) {
+          return RedV.size() < 2 || !allConstant(RedV) || !isSplat(RedV);
+        })) {
       for (ReductionOpsType &RdxOps : ReductionOps)
         for (Value *RdxOp : RdxOps)
           V.analyzedReductionRoot(cast<Instruction>(RdxOp));
@@ -17698,8 +17690,7 @@ class HorizontalReduction {
       }
 
       // Emit code for constant values.
-      if (AllowHorRdxIdenityOptimization && Candidates.size() > 1 &&
-          allConstant(Candidates)) {
+      if (Candidates.size() > 1 && allConstant(Candidates)) {
         Value *Res = Candidates.front();
         ++VectorizedVals.try_emplace(Candidates.front(), 0).first->getSecond();
         for (Value *VC : ArrayRef(Candidates).drop_front()) {
@@ -17714,15 +17705,14 @@ class HorizontalReduction {
 
       unsigned NumReducedVals = Candidates.size();
       if (NumReducedVals < ReductionLimit &&
-          (NumReducedVals < 2 || !AllowHorRdxIdenityOptimization ||
-           !isSplat(Candidates)))
+          (NumReducedVals < 2 || !isSplat(Candidates)))
         continue;
 
       // Check if we support repeated scalar values processing (optimization of
       // original scalar identity operations on matched horizontal reductions).
-      IsSupportedHorRdxIdentityOp =
-          AllowHorRdxIdenityOptimization && RdxKind != RecurKind::Mul &&
-          RdxKind != RecurKind::FMul && RdxKind != RecurKind::FMulAdd;
+      IsSupportedHorRdxIdentityOp = RdxKind != RecurKind::Mul &&
+                                    RdxKind != RecurKind::FMul &&
+                                    RdxKind != RecurKind::FMulAdd;
       // Gather same values.
       MapVector<Value *, unsigned> SameValuesCounter;
       if (IsSupportedHorRdxIdentityOp)
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll
index 2c417804c83e0d..bbc2bfdcb6c160 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/buildvector-reduce.ll
@@ -1,6 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx | FileCheck %s
-; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=arm64-apple-macosx -slp-optimize-identity-hor-reduction-ops=false | FileCheck %s --check-prefix=NO-IDENTITY
 
 define i8 @test() {
 ; CHECK-LABEL: @test(
@@ -12,17 +11,6 @@ define i8 @test() {
 ; CHECK-NEXT:    [[TMP0]] = mul i32 [[CALL278]], 8
 ; CHECK-NEXT:    br label [[FOR_BODY]]
 ;
-; NO-IDENTITY-LABEL: @test(
-; NO-IDENTITY-NEXT:  entry:
-; NO-IDENTITY-NEXT:    br label [[FOR_BODY:%.*]]
-; NO-IDENTITY:       for.body:
-; NO-IDENTITY-NEXT:    [[SUM:%.*]] = phi i32 [ [[TMP2:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
-; NO-IDENTITY-NEXT:    [[CALL278:%.*]] = call i32 @fn(i32 [[SUM]])
-; NO-IDENTITY-NEXT:    [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[CALL278]], i32 0
-; NO-IDENTITY-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
-; NO-IDENTITY-NEXT:    [[TMP2]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP1]])
-; NO-IDENTITY-NEXT:    br label [[FOR_BODY]]
-;
 entry:
   br label %for.body
 
diff --git a/llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll b/llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll
deleted file mode 100644
index 6524b378f3d8bb..00000000000000
--- a/llvm/test/Transforms/SLPVectorizer/SystemZ/minbitwidth-non-vector-root.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -passes=slp-vectorizer -S -slp-optimize-identity-hor-reduction-ops=false < %s -mtriple=s390x-ibm-linux -mcpu=arch13 | FileCheck %s
-
-define void @foo() {
-; CHECK-LABEL: define void @foo(
-; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)
-; CHECK-NEXT:    store i32 [[TMP1]], ptr null, align 4
-; CHECK-NEXT:    ret void
-;
-  %1 = add i32 0, 0
-  %2 = add i32 %1, 0
-  %3 = add i32 %2, 0
-  store i32 %3, ptr null, align 4
-  ret void
-}

@alexey-bataev
Copy link
Member

This option was requested by Intel engs before, not sure they are ready to remove it.

@preames
Copy link
Collaborator Author

preames commented Aug 27, 2024

This option was requested by Intel engs before, not sure they are ready to remove it.

In general, as a project, we actively do not support downstream forks with otherwise unneeded upstream code. Unless there's a clear problem expressed, I'd still like to go ahead and remove the code. They can apply a downstream patch if they want, or report an issue against the upstream.

@preames preames merged commit ee764a2 into llvm:main Aug 27, 2024
3 of 6 checks passed
@preames preames deleted the pr-slp-remove-slp-optimize-identity-hor-reduction-ops branch August 27, 2024 20:22
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 27, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-nfc running on bolt-worker while building llvm at step 8 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/5328

Here is the relevant piece of the build log for the reference
Step 8 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT :: perf2bolt/perf_test.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 5: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
RUN: at line 6: perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 (9 samples) ]
RUN: at line 7: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test:10:12: error: CHECK-NOT: excluded string found in input
CHECK-NOT: !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection.
           ^
<stdin>:27:2: note: found here
 !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance.
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        .
        .
        .
       22: BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. 
       23: PERF2BOLT: processing basic events (without LBR)... 
       24: PERF2BOLT: read 9 samples 
       25: PERF2BOLT: out of range samples recorded in unknown regions: 9 (100.0%) 
       26:  
       27:  !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance. 
not:10      !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                   error: no match expected
       28:  
       29: PERF2BOLT: wrote 0 objects and 0 memory objects to /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 
       30: BOLT-INFO: 0 out of 13 functions in the binary (0.0%) have non-empty execution profile 
>>>>>>

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants