Skip to content

[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage #108370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 12, 2024

A half with only zvfhmin or bfloat will end up getting promoted to a f32 for most instructions.

Unless the loop consists only of memory ops and permutation instructions which don't need promoted (is this common?), we'll end up using double the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be conservative and assume that any usage of a zvfhmin half/bfloat will end up being widened to a f32

A half with only zvfhmin or bfloat will end up getting promoted to a f32 for most instructions.

Unless the loop consists only of memory ops and permutation instructions which don't need promoted (is this common?), we'll end up using double the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be conservative and assume that any usage of a zvfhmin half/bfloat will end up being widened to a f32.
@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

A half with only zvfhmin or bfloat will end up getting promoted to a f32 for most instructions.

Unless the loop consists only of memory ops and permutation instructions which don't need promoted (is this common?), we'll end up using double the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be conservative and assume that any usage of a zvfhmin half/bfloat will end up being widened to a f32


Full diff: https://github.com/llvm/llvm-project/pull/108370.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+8-1)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll (+31)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-f16.ll (+37)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 2b5e7c47279284..3303534ecb4968 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2030,8 +2030,15 @@ void RISCVTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE,
 }
 
 unsigned RISCVTTIImpl::getRegUsageForType(Type *Ty) {
-  TypeSize Size = DL.getTypeSizeInBits(Ty);
   if (Ty->isVectorTy()) {
+    // f16 w/ zvfhmin and bf16 types will be promoted to f32
+    Type *EltTy = cast<VectorType>(Ty)->getElementType();
+    if ((EltTy->isHalfTy() && !ST->hasVInstructionsF16()) ||
+        EltTy->isBFloatTy())
+      Ty = VectorType::get(Type::getFloatTy(Ty->getContext()),
+                           cast<VectorType>(Ty));
+
+    TypeSize Size = DL.getTypeSizeInBits(Ty);
     if (Size.isScalable() && ST->hasVInstructions())
       return divideCeil(Size.getKnownMinValue(), RISCV::RVVBitsPerBlock);
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll b/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
new file mode 100644
index 00000000000000..89514431278a74
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
@@ -0,0 +1,31 @@
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s
+
+define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i32 signext %size, ptr noalias nocapture writeonly %result) {
+; CHECK-LABEL: add
+; CHECK:       LV(REG): Found max usage: 2 item
+; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 2 registers
+; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 4 registers
+; CHECK-NEXT:  LV(REG): Found invariant usage: 1 item
+; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
+
+entry:
+  %conv = zext i32 %size to i64
+  %cmp10.not = icmp eq i32 %size, 0
+  br i1 %cmp10.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i.011 = phi i64 [ %add4, %for.body ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds bfloat, ptr %src1, i64 %i.011
+  %0 = load bfloat, ptr %arrayidx, align 4
+  %arrayidx2 = getelementptr inbounds bfloat, ptr %src2, i64 %i.011
+  %1 = load bfloat, ptr %arrayidx2, align 4
+  %add = fadd bfloat %0, %1
+  %arrayidx3 = getelementptr inbounds bfloat, ptr %result, i64 %i.011
+  store bfloat %add, ptr %arrayidx3, align 4
+  %add4 = add nuw nsw i64 %i.011, 1
+  %exitcond.not = icmp eq i64 %add4, %conv
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-f16.ll b/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-f16.ll
new file mode 100644
index 00000000000000..ceedcfba4691e1
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-f16.ll
@@ -0,0 +1,37 @@
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfh -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s --check-prefix=ZVFH
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfhmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s --check-prefix=ZVFHMIN
+
+define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i32 signext %size, ptr noalias nocapture writeonly %result) {
+; CHECK-LABEL: add
+; ZVFH:       LV(REG): Found max usage: 2 item
+; ZVFH-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 2 registers
+; ZVFH-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 2 registers
+; ZVFH-NEXT:  LV(REG): Found invariant usage: 1 item
+; ZVFH-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
+; ZVFHMIN:       LV(REG): Found max usage: 2 item
+; ZVFHMIN-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 2 registers
+; ZVFHMIN-NEXT:  LV(REG): RegisterClass: RISCV::VRRC, 4 registers
+; ZVFHMIN-NEXT:  LV(REG): Found invariant usage: 1 item
+; ZVFHMIN-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
+
+entry:
+  %conv = zext i32 %size to i64
+  %cmp10.not = icmp eq i32 %size, 0
+  br i1 %cmp10.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i.011 = phi i64 [ %add4, %for.body ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds half, ptr %src1, i64 %i.011
+  %0 = load half, ptr %arrayidx, align 4
+  %arrayidx2 = getelementptr inbounds half, ptr %src2, i64 %i.011
+  %1 = load half, ptr %arrayidx2, align 4
+  %add = fadd half %0, %1
+  %arrayidx3 = getelementptr inbounds half, ptr %result, i64 %i.011
+  store half %add, ptr %arrayidx3, align 4
+  %add4 = add nuw nsw i64 %i.011, 1
+  %exitcond.not = icmp eq i64 %add4, %conv
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

Copy link
Contributor

@jacquesguan jacquesguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukel97 lukel97 merged commit 41f1b46 into llvm:main Sep 17, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 17, 2024

LLVM Buildbot has detected a new failure on builder fuchsia-x86_64-linux running on fuchsia-debian-64-us-central1-a-1 while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/11/builds/5124

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/fuchsia-linux.py ...' (failure)
...
[1322/1327] Building CXX object unittests/Transforms/Scalar/CMakeFiles/ScalarTests.dir/LoopPassManagerTest.cpp.o
clang++: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument]
[1323/1327] Linking CXX executable unittests/Transforms/Scalar/ScalarTests
[1323/1327] Running the LLVM regression tests
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/ld.lld
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/lld-link
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/ld64.lld
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/wasm-ld
-- Testing: 55717 tests, 60 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll (44761 of 55717)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S < /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll 2>&1 | /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/FileCheck /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
+ /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S
+ /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/FileCheck /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll:4:16: error: CHECK-LABEL: expected string not found in input
; CHECK-LABEL: add
               ^
<stdin>:1:1: note: scanning from here
opt: Unknown command line argument '-debug-only=loop-vectorize'. Try: '/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt --help'
^
<stdin>:1:18: note: possible intended match here
opt: Unknown command line argument '-debug-only=loop-vectorize'. Try: '/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt --help'
                 ^

Input file: <stdin>
Check file: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: opt: Unknown command line argument '-debug-only=loop-vectorize'. Try: '/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt --help' 
label:4'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
label:4'1                      ?                                                                                                                                       possible intended match
           2: opt: Did you mean '--debug-pass=loop-vectorize'? 
label:4'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-f16.ll (44763 of 55717)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-f16.ll' FAILED ********************
Step 7 (check) failure: check (failure)
...
[1322/1327] Building CXX object unittests/Transforms/Scalar/CMakeFiles/ScalarTests.dir/LoopPassManagerTest.cpp.o
clang++: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument]
[1323/1327] Linking CXX executable unittests/Transforms/Scalar/ScalarTests
[1323/1327] Running the LLVM regression tests
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/ld.lld
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/lld-link
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/ld64.lld
llvm-lit: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/wasm-ld
-- Testing: 55717 tests, 60 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll (44761 of 55717)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S < /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll 2>&1 | /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/FileCheck /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
+ /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S
+ /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/FileCheck /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll:4:16: error: CHECK-LABEL: expected string not found in input
; CHECK-LABEL: add
               ^
<stdin>:1:1: note: scanning from here
opt: Unknown command line argument '-debug-only=loop-vectorize'. Try: '/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt --help'
^
<stdin>:1:18: note: possible intended match here
opt: Unknown command line argument '-debug-only=loop-vectorize'. Try: '/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt --help'
                 ^

Input file: <stdin>
Check file: /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: opt: Unknown command line argument '-debug-only=loop-vectorize'. Try: '/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-2f042fl1/bin/opt --help' 
label:4'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
label:4'1                      ?                                                                                                                                       possible intended match
           2: opt: Did you mean '--debug-pass=loop-vectorize'? 
label:4'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-f16.ll (44763 of 55717)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/reg-usage-f16.ll' FAILED ********************

@@ -0,0 +1,31 @@
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need REQUIRES: asserts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 30d7dcc

tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
…lvm#108370)

A half with only zvfhmin or bfloat will end up getting promoted to a f32
for most instructions.

Unless the loop consists only of memory ops and permutation instructions
which don't need promoted (is this common?), we'll end up using double
the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be
conservative and assume that any usage of a zvfhmin half/bfloat will end
up being widened to a f32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants