Skip to content

[VPlan] Also print final VPlan directly before codegen/execute. #82269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 28, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Feb 19, 2024

Some optimizations are apply after UF and VF have been chosen. This patch adds an extra print of the final VPlan just before codegen/execution.

In the future, there will be additional transforms that are applied later (interleaving for example).

Some optimizations are apply after UF and VF have been chosen. This
patch adds an extra print of the final VPlan just before
codegen/execution.

In the future, there will be additional transforms that are applied
later (interleaving for example).
@llvmbot
Copy link
Member

llvmbot commented Feb 19, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Some optimizations are apply after UF and VF have been chosen. This patch adds an extra print of the final VPlan just before codegen/execution.

In the future, there will be additional transforms that are applied later (interleaving for example).


Full diff: https://github.com/llvm/llvm-project/pull/82269.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+3)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+2-2)
  • (added) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (+90)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index eca901fcdae4ce..9a4d031e776de1 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7455,6 +7455,9 @@ LoopVectorizationPlanner::executePlan(
   if (!IsEpilogueVectorization)
     VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
 
+  BestVPlan.setName("Final VPlan");
+  LLVM_DEBUG(BestVPlan.dump());
+
   // Perform the actual loop transformation.
   VPTransformState State(BestVF, BestUF, LI, DT, ILV.Builder, &ILV, &BestVPlan,
                          OrigLoop->getHeader()->getContext());
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index 1bcd7a2e009e0b..da6dc34e409684 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -120,7 +120,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in <stdin>
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
-; CHECK-NEXT:  LV: Interleaving disabled by the pass manager
+; CHECK:       LV: Interleaving disabled by the pass manager
 ; CHECK-NEXT:  LV: Vectorizing: innermost loop.
 ;
 entry:
@@ -260,7 +260,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in <stdin>
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
-; CHECK-NEXT:  LV: Interleaving disabled by the pass manager
+; CHECK:       LV: Interleaving disabled by the pass manager
 ; CHECK-NEXT:  LV: Vectorizing: innermost loop.
 ;
 entry:
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll
new file mode 100644
index 00000000000000..1dddbfe20a2ed7
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll
@@ -0,0 +1,90 @@
+; RUN: opt -passes=loop-vectorize -force-vector-width=8 -force-vector-interleave=2 -disable-output -debug -S %s 2>&1 | FileCheck --check-prefixes=CHECK %s
+
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+
+; REQUIRES: asserts
+
+; Check if the vector loop condition can be simplified to true for a given
+; VF/IC combination.
+define void @test_tc_less_than_16(ptr %A, i64 %N) {
+; CHECK:      LV: Scalarizing:  %cmp =
+; CHECK-NEXT: VPlan 'Initial VPlan for VF={8},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: vp<[[TC:%.+]]> = original trip-count
+; CHECK-EMPTY:
+; CHECK-NEXT: ph:
+; CHECK-NEXT:   EMIT vp<[[TC]]> = EXPAND SCEV (zext i4 (trunc i64 %N to i4) to i64)
+; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.ph:
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[CAN_IV_NEXT:%.+]]>
+; CHECK-NEXT:     EMIT ir<%p.src> = WIDEN-POINTER-INDUCTION ir<%A>, 1
+; CHECK-NEXT:     vp<[[VPTR:%.]]> = vector-pointer ir<%p.src>
+; CHECK-NEXT:     WIDEN ir<%l> = load vp<[[VPTR]]>
+; CHECK-NEXT:     WIDEN ir<%add> = add nsw ir<%l>, ir<10>
+; CHECK-NEXT:     vp<[[VPTR2:%.+]]> = vector-pointer ir<%p.src>
+; CHECK-NEXT:     WIDEN store vp<[[VPTR2]]>, ir<%add>
+; CHECK-NEXT:     EMIT vp<[[CAN_IV_NEXT]]> = add nuw vp<[[CAN_IV:%.+]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-count vp<[[CAN_IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+; CHECK-NEXT: Successor(s): middle.block
+; CHECK-EMPTY:
+; CHECK-NEXT: middle.block:
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+;
+; CHECK: Executing best plan with VF=8, UF=2
+; CHECK-NEXT: VPlan 'Final VPlan for VF={8},UF={2}' {
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: vp<[[TC:%.+]]> = original trip-count
+; CHECK-EMPTY:
+; CHECK-NEXT: ph:
+; CHECK-NEXT:   EMIT vp<[[TC]]> = EXPAND SCEV (zext i4 (trunc i64 %N to i4) to i64)
+; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.ph:
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[CAN_IV_NEXT:%.+]]>
+; CHECK-NEXT:     EMIT ir<%p.src> = WIDEN-POINTER-INDUCTION ir<%A>, 1
+; CHECK-NEXT:     vp<[[VPTR:%.]]> = vector-pointer ir<%p.src>
+; CHECK-NEXT:     WIDEN ir<%l> = load vp<[[VPTR]]>
+; CHECK-NEXT:     WIDEN ir<%add> = add nsw ir<%l>, ir<10>
+; CHECK-NEXT:     vp<[[VPTR2:%.+]]> = vector-pointer ir<%p.src>
+; CHECK-NEXT:     WIDEN store vp<[[VPTR2]]>, ir<%add>
+; CHECK-NEXT:     EMIT vp<[[CAN_IV_NEXT]]> = add nuw vp<[[CAN_IV:%.+]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-cond ir<true>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+; CHECK-NEXT: Successor(s): middle.block
+; CHECK-EMPTY:
+; CHECK-NEXT: middle.block:
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+;
+entry:
+  %and = and i64 %N, 15
+  br label %loop
+
+loop:
+  %iv = phi i64 [ %and, %entry ], [ %iv.next, %loop ]
+  %p.src = phi ptr [ %A, %entry ], [ %p.src.next, %loop ]
+  %p.src.next = getelementptr inbounds i8, ptr %p.src, i64 1
+  %l = load i8, ptr %p.src, align 1
+  %add = add nsw i8 %l, 10
+  store i8 %add, ptr %p.src
+  %iv.next = add nsw i64 %iv, -1
+  %cmp = icmp eq i64 %iv.next, 0
+  br i1 %cmp, label %exit, label %loop
+
+exit:
+  ret void
+}

@fhahn
Copy link
Contributor Author

fhahn commented Feb 27, 2024

ping :)

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition, raises some thoughts.

@@ -7456,6 +7456,9 @@ LoopVectorizationPlanner::executePlan(
if (!IsEpilogueVectorization)
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);

BestVPlan.setName("Final VPlan");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Final" VPlans should indeed be printed after the last optimizeForVFAndUF(), but preferably along with the "Executing best plan ..." dump above?

Regarding VPlan's name - "Initial" VPlans are currently dumped following buildVPlansWithVPRecipes() which also optimize()'s them, except for the last optimizeForVFAndUF(). Perhaps the processing stage a VPlan is currently in should be better maintained than in a name string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will move printing "Executing best plan..." down!

Regarding VPlan's name - "Initial" VPlans are currently dumped following buildVPlansWithVPRecipes() which also optimize()'s them, except for the last optimizeForVFAndUF(). Perhaps the processing stage a VPlan is currently in should be better maintained than in a name string.

Perhaps Initial = after initial construction, Optimized = during most VPlan transforms, Final = just before execution?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Initial = after initial construction, Optimized = during most VPlan transforms, Final = just before execution?

SGTM. Although the name may be used more persistently to associate a VPlan with the fixed original loop, complemented with optimization decisions (e.g., VF, UF), rather than keep track of which optimizations were processed.

Copy link

github-actions bot commented Feb 28, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn fhahn merged commit 15d9d0f into llvm:main Feb 28, 2024
@fhahn fhahn deleted the vplan-print-final-plan branch February 28, 2024 13:19
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Mar 21, 2024
…c746d9591

Local branch amd-gfx df9c746 Merged main:37daff028fcec27f2be1bb990df77e19c0244ccf into amd-gfx:2c4204e0a79d
Remote branch main 15d9d0f [VPlan] Also print final VPlan directly before codegen/execute. (llvm#82269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants