Skip to content

[RISCV] Add a tune feature to disable stripping W suffix #86255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 25, 2024

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Mar 22, 2024

We have a hidden option to disable it, but I'd like to make it a
tune feature.

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.

We have a hidden option to disable it, but I'd like to make it a
tune feature.

For some implementations, instructions with W suffix would be less
costly as it only performs on 32 bits data. Though we may lose some
chances to compress.
@llvmbot
Copy link
Member

llvmbot commented Mar 22, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Wang Pengcheng (wangpc-pp)

Changes

We have a hidden option to disable it, but I'd like to make it a
tune feature.

For some implementations, instructions with W suffix would be less
costly as it only performs on 32 bits data. Though we may lose some
chances to compress.


Full diff: https://github.com/llvm/llvm-project/pull/86255.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+4)
  • (modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+1-1)
  • (added) llvm/test/CodeGen/RISCV/strip-w-suffix.ll (+74)
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index f3e641e250182b..6ef2289bb4beeb 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1226,6 +1226,10 @@ def TuneNoSinkSplatOperands
                        "false", "Disable sink splat operands to enable .vx, .vf,"
                        ".wx, and .wf instructions">;
 
+def TuneNoStripWSuffix
+    : SubtargetFeature<"no-strip-w-suffix", "EnableStripWSuffix", "false",
+                       "Disable strip W suffix">;
+
 def TuneConditionalCompressedMoveFusion
     : SubtargetFeature<"conditional-cmv-fusion", "HasConditionalCompressedMoveFusion",
                        "true", "Enable branch+c.mv fusion">;
diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index dcf70e8cad6408..39d420c2fbf080 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -672,7 +672,7 @@ bool RISCVOptWInstrs::stripWSuffixes(MachineFunction &MF,
                                      const RISCVInstrInfo &TII,
                                      const RISCVSubtarget &ST,
                                      MachineRegisterInfo &MRI) {
-  if (DisableStripWSuffix)
+  if (DisableStripWSuffix || !ST.enableStripWSuffix())
     return false;
 
   bool MadeChange = false;
diff --git a/llvm/test/CodeGen/RISCV/strip-w-suffix.ll b/llvm/test/CodeGen/RISCV/strip-w-suffix.ll
new file mode 100644
index 00000000000000..b403a5ea3bd898
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/strip-w-suffix.ll
@@ -0,0 +1,74 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=STRIP %s
+; RUN: llc -mtriple=riscv64 -mattr=+m,+no-strip-w-suffix -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=NO-STRIP %s
+
+define i32 @addiw(i32 signext %a) nounwind {
+; STRIP-LABEL: addiw:
+; STRIP:       # %bb.0:
+; STRIP-NEXT:    lui a1, 1
+; STRIP-NEXT:    addi a1, a1, -1
+; STRIP-NEXT:    addw a0, a0, a1
+; STRIP-NEXT:    ret
+;
+; NO-STRIP-LABEL: addiw:
+; NO-STRIP:       # %bb.0:
+; NO-STRIP-NEXT:    lui a1, 1
+; NO-STRIP-NEXT:    addiw a1, a1, -1
+; NO-STRIP-NEXT:    addw a0, a0, a1
+; NO-STRIP-NEXT:    ret
+  %ret = add i32 %a, 4095
+  ret i32 %ret
+}
+
+define signext i32 @addw(i32 signext %a, i32 signext %b) nounwind {
+; STRIP-LABEL: addw:
+; STRIP:       # %bb.0:
+; STRIP-NEXT:    add a0, a0, a1
+; STRIP-NEXT:    addiw a0, a0, 1024
+; STRIP-NEXT:    ret
+;
+; NO-STRIP-LABEL: addw:
+; NO-STRIP:       # %bb.0:
+; NO-STRIP-NEXT:    addw a0, a0, a1
+; NO-STRIP-NEXT:    addiw a0, a0, 1024
+; NO-STRIP-NEXT:    ret
+  %add = add i32 %a, %b
+  %ret = add i32 %add, 1024
+  ret i32 %ret
+}
+
+define signext i32 @mulw(i32 signext %a, i32 signext %b) nounwind {
+; STRIP-LABEL: mulw:
+; STRIP:       # %bb.0:
+; STRIP-NEXT:    mul a0, a0, a1
+; STRIP-NEXT:    addiw a0, a0, 1024
+; STRIP-NEXT:    ret
+;
+; NO-STRIP-LABEL: mulw:
+; NO-STRIP:       # %bb.0:
+; NO-STRIP-NEXT:    mulw a0, a0, a1
+; NO-STRIP-NEXT:    addiw a0, a0, 1024
+; NO-STRIP-NEXT:    ret
+  %mul = mul i32 %a, %b
+  %ret = add i32 %mul, 1024
+  ret i32 %ret
+}
+
+define signext i32 @slliw(i32 signext %a) nounwind {
+; STRIP-LABEL: slliw:
+; STRIP:       # %bb.0:
+; STRIP-NEXT:    slli a0, a0, 1
+; STRIP-NEXT:    addiw a0, a0, 1024
+; STRIP-NEXT:    ret
+;
+; NO-STRIP-LABEL: slliw:
+; NO-STRIP:       # %bb.0:
+; NO-STRIP-NEXT:    slliw a0, a0, 1
+; NO-STRIP-NEXT:    addiw a0, a0, 1024
+; NO-STRIP-NEXT:    ret
+  %shl = shl i32 %a, 1
+  %ret = add i32 %shl, 1024
+  ret i32 %ret
+}

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@topperc
Copy link
Collaborator

topperc commented Mar 22, 2024

For some implementations, instructions with W suffix would be less

costly as they only perform on 32 bits data. Though we may lose some

chances to compress.

Do you know of a real implementation where this true?

@wangpc-pp
Copy link
Contributor Author

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.

Do you know of a real implementation where this true?

Yes, this requirement is from hardware team actually and I didn't notice this stripping W suffix pass before.

@topperc
Copy link
Collaborator

topperc commented Mar 22, 2024

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.

Do you know of a real implementation where this true?

Yes, this requirement is from hardware team actually and I didn't notice this stripping W suffix pass before.

Is that for all 4 instructions(ADDW, ADDIW, SLLIW, MULW) that are in that pass or just MULW? I know multiply complexity increases a lot when the size is doubled.

@wangpc-pp
Copy link
Contributor Author

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.

Do you know of a real implementation where this true?

Yes, this requirement is from hardware team actually and I didn't notice this stripping W suffix pass before.

Is that for all 4 instructions(ADDW, ADDIW, SLLIW, MULW) that are in that pass or just MULW?

My intent is for all W instructions, not only multiply. Some low resources/low power consumption scenarios may need this too.

@topperc
Copy link
Collaborator

topperc commented Mar 22, 2024

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.

Do you know of a real implementation where this true?

Yes, this requirement is from hardware team actually and I didn't notice this stripping W suffix pass before.

Is that for all 4 instructions(ADDW, ADDIW, SLLIW, MULW) that are in that pass or just MULW?

My intent is for all W instructions, not only multiply. Some low resources/low power consumption scenarios may need this too.

I was asking specifically about your hardware requirement. It's a bit questionable that MULW is in there. It's only useful with the Zcb extension and I think a cycle difference between MUL and MULW is more likely than ADD.

I don't know if we're optimal on using W instructions so the existence of such hardware implies we might also need an "add W suffix" pass.

@wangpc-pp
Copy link
Contributor Author

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.

Do you know of a real implementation where this true?

Yes, this requirement is from hardware team actually and I didn't notice this stripping W suffix pass before.

Is that for all 4 instructions(ADDW, ADDIW, SLLIW, MULW) that are in that pass or just MULW?

My intent is for all W instructions, not only multiply. Some low resources/low power consumption scenarios may need this too.

I was asking specifically about your hardware requirement. It's a bit questionable that MULW is in there. It's only useful with the Zcb extension and I think a cycle difference between MUL and MULW is more likely than ADD.

I can't say too much details here, sorry for that. All I can say is that hardware can do some performance/power optimizations if there are more W instructions.

I don't know if we're optimal on using W instructions so the existence of such hardware implies we might also need an "add W suffix" pass.

Yeah, I was thinking about the same thing and it's on my TODO list now. :-)

@wangpc-pp
Copy link
Contributor Author

Is it OK to you? @topperc

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangpc-pp wangpc-pp merged commit 6af6416 into llvm:main Mar 25, 2024
@wangpc-pp wangpc-pp deleted the main-riscv-split-reg-bank branch March 25, 2024 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants