Skip to content

[RISCV] Support constant hoisting of immediate store values #96073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 17, 2024

Conversation

asb
Copy link
Contributor

@asb asb commented Jun 19, 2024

Previously getIntImmInstCost only calculated the cost of materialising the argument of a store if it was the address. This means ConstantHoisting's transformation wouldn't kick in for cases like storing two values that require multiple instructions to materialise but where one can be cheaply generated from the other (e.g. by an addition).

Two key changes were needed to avoid regressions when enabling this:

  • Allowing constant materialisation cost to be calculated assuming zeroes are free (as might happen if you had a 2*XLEN constant and one half is zero).
  • Avoiding constant hoisting if we have a misaligned store that's going to be a legalised to a sequence of narrower stores. I'm seeing cases where hoisting the constant ends up with worse codegen in that case.

@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

Previously getIntImmInstCost only calculated the cost of materialising the argument of a store if it was the address. This means ConstantHoisting's transformation wouldn't kick in for cases like storing two values that require multiple instructions to materialise but where one can be cheaply generated from the other (e.g. by an addition).

Two key changes were needed to avoid regressions when enabling this:

  • Allowing constant materialisation cost to be calculated assuming zeroes are free (as might happen if you had a 2*XLEN constant and one half is zero).
  • Avoiding constant hoisting if we have a misaligned store that's going to be a legalised to a sequence of narrower stores. I'm seeing cases where hoisting the constant ends up with worse codegen in that case.

Full diff: https://github.com/llvm/llvm-project/pull/96073.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp (+4-2)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h (+6-1)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+18-5)
  • (modified) llvm/test/CodeGen/RISCV/pr64935.ll (+1-4)
  • (modified) llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll (+4-3)
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
index 0a857eb96935e..3c927b7413136 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
@@ -499,7 +499,7 @@ InstSeq generateTwoRegInstSeq(int64_t Val, const MCSubtargetInfo &STI,
 }
 
 int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
-                  bool CompressionCost) {
+                  bool CompressionCost, bool FreeZeroes) {
   bool IsRV64 = STI.hasFeature(RISCV::Feature64Bit);
   bool HasRVC = CompressionCost && (STI.hasFeature(RISCV::FeatureStdExtC) ||
                                     STI.hasFeature(RISCV::FeatureStdExtZca));
@@ -510,10 +510,12 @@ int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
   int Cost = 0;
   for (unsigned ShiftVal = 0; ShiftVal < Size; ShiftVal += PlatRegSize) {
     APInt Chunk = Val.ashr(ShiftVal).sextOrTrunc(PlatRegSize);
+    if (FreeZeroes && Chunk == 0)
+      continue;
     InstSeq MatSeq = generateInstSeq(Chunk.getSExtValue(), STI);
     Cost += getInstSeqCost(MatSeq, HasRVC);
   }
-  return std::max(1, Cost);
+  return std::max(FreeZeroes ? 0 : 1, Cost);
 }
 
 OpndKind Inst::getOpndKind() const {
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h
index e87e0f3256470..ae94f3778b217 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h
@@ -71,8 +71,13 @@ InstSeq generateTwoRegInstSeq(int64_t Val, const MCSubtargetInfo &STI,
 // If CompressionCost is true it will use a different cost calculation if RVC is
 // enabled. This should be used to compare two different sequences to determine
 // which is more compressible.
+//
+// If FreeZeroes is true, it will be assumed free to materialize any
+// XLen-sized chunks that are 0. This is appropriate to use in instances when
+// the zero register can be used, e.g. when estimating the cost of
+// materializing a value used by a particular operation.
 int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
-                  bool CompressionCost = false);
+                  bool CompressionCost = false, bool FreeZeroes = false);
 } // namespace RISCVMatInt
 } // namespace llvm
 #endif
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 176d0e79253ac..17b3ce01f3cc8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -120,7 +120,9 @@ InstructionCost RISCVTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
 
   // Otherwise, we check how many instructions it will take to materialise.
   const DataLayout &DL = getDataLayout();
-  return RISCVMatInt::getIntMatCost(Imm, DL.getTypeSizeInBits(Ty), *getST());
+  return RISCVMatInt::getIntMatCost(Imm, DL.getTypeSizeInBits(Ty), *getST(),
+                                    /*CompressionCost=*/false,
+                                    /*FreeZeroes=*/true);
 }
 
 // Look for patterns of shift followed by AND that can be turned into a pair of
@@ -172,11 +174,22 @@ InstructionCost RISCVTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
     // split up large offsets in GEP into better parts than ConstantHoisting
     // can.
     return TTI::TCC_Free;
-  case Instruction::Store:
-    // If the address is a constant, use the materialization cost.
-    if (Idx == 1)
+  case Instruction::Store: {
+    // Use the materialization cost regardless of if it's the address or the
+    // value that is constant, except for if the store is misaligned and
+    // misaligned accesses are not legal (experience shows constant hoisting
+    // can sometimes be harmful in such cases).
+    if (Idx == 1 || !Inst)
       return getIntImmCost(Imm, Ty, CostKind);
-    return TTI::TCC_Free;
+
+    StoreInst *ST = cast<StoreInst>(Inst);
+    if (!getTLI()->allowsMemoryAccessForAlignment(
+            Ty->getContext(), DL, getTLI()->getValueType(DL, Ty),
+            ST->getPointerAddressSpace(), ST->getAlign()))
+      return TTI::TCC_Free;
+
+    return getIntImmCost(Imm, Ty, CostKind);
+  }
   case Instruction::Load:
     // If the address is a constant, use the materialization cost.
     return getIntImmCost(Imm, Ty, CostKind);
diff --git a/llvm/test/CodeGen/RISCV/pr64935.ll b/llvm/test/CodeGen/RISCV/pr64935.ll
index 60be5fa6c994e..b712db0dc99d6 100644
--- a/llvm/test/CodeGen/RISCV/pr64935.ll
+++ b/llvm/test/CodeGen/RISCV/pr64935.ll
@@ -4,10 +4,7 @@
 define i1 @f() {
 ; CHECK-LABEL: f:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, 524288
-; CHECK-NEXT:    not a0, a0
-; CHECK-NEXT:    sltiu a0, a0, 2
-; CHECK-NEXT:    xori a0, a0, 1
+; CHECK-NEXT:    li a0, 1
 ; CHECK-NEXT:    ret
   %B25 = shl i64 4294967296, -9223372036854775808
   %B13 = sub i64 -1, -9223372036854775808
diff --git a/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll b/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll
index 8f57df6edb2c0..29df749e10059 100644
--- a/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll
+++ b/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll
@@ -211,11 +211,12 @@ exit:
 
 ; Check that we use a common base for immediates needed by a store if the
 ; constants require more than 1 instruction.
-; TODO: This doesn't trigger currently.
 define void @test20(ptr %p1, ptr %p2) {
 ; CHECK-LABEL: test20
-; CHECK: store i32 15111111, ptr %p1
-; CHECK: store i32 15111112, ptr %p2
+; CHECK: %const = bitcast i32 15111111 to i32
+; CHECK: store i32 %const, ptr %p1, align 4
+; CHECK: %const_mat = add i32 %const, 1
+; CHECK: store i32 %const_mat, ptr %p2, align 4
   store i32 15111111, ptr %p1, align 4
   store i32 15111112, ptr %p2, align 4
   ret void

@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Alex Bradbury (asb)

Changes

Previously getIntImmInstCost only calculated the cost of materialising the argument of a store if it was the address. This means ConstantHoisting's transformation wouldn't kick in for cases like storing two values that require multiple instructions to materialise but where one can be cheaply generated from the other (e.g. by an addition).

Two key changes were needed to avoid regressions when enabling this:

  • Allowing constant materialisation cost to be calculated assuming zeroes are free (as might happen if you had a 2*XLEN constant and one half is zero).
  • Avoiding constant hoisting if we have a misaligned store that's going to be a legalised to a sequence of narrower stores. I'm seeing cases where hoisting the constant ends up with worse codegen in that case.

Full diff: https://github.com/llvm/llvm-project/pull/96073.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp (+4-2)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h (+6-1)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+18-5)
  • (modified) llvm/test/CodeGen/RISCV/pr64935.ll (+1-4)
  • (modified) llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll (+4-3)
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
index 0a857eb96935e..3c927b7413136 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
@@ -499,7 +499,7 @@ InstSeq generateTwoRegInstSeq(int64_t Val, const MCSubtargetInfo &STI,
 }
 
 int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
-                  bool CompressionCost) {
+                  bool CompressionCost, bool FreeZeroes) {
   bool IsRV64 = STI.hasFeature(RISCV::Feature64Bit);
   bool HasRVC = CompressionCost && (STI.hasFeature(RISCV::FeatureStdExtC) ||
                                     STI.hasFeature(RISCV::FeatureStdExtZca));
@@ -510,10 +510,12 @@ int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
   int Cost = 0;
   for (unsigned ShiftVal = 0; ShiftVal < Size; ShiftVal += PlatRegSize) {
     APInt Chunk = Val.ashr(ShiftVal).sextOrTrunc(PlatRegSize);
+    if (FreeZeroes && Chunk == 0)
+      continue;
     InstSeq MatSeq = generateInstSeq(Chunk.getSExtValue(), STI);
     Cost += getInstSeqCost(MatSeq, HasRVC);
   }
-  return std::max(1, Cost);
+  return std::max(FreeZeroes ? 0 : 1, Cost);
 }
 
 OpndKind Inst::getOpndKind() const {
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h
index e87e0f3256470..ae94f3778b217 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.h
@@ -71,8 +71,13 @@ InstSeq generateTwoRegInstSeq(int64_t Val, const MCSubtargetInfo &STI,
 // If CompressionCost is true it will use a different cost calculation if RVC is
 // enabled. This should be used to compare two different sequences to determine
 // which is more compressible.
+//
+// If FreeZeroes is true, it will be assumed free to materialize any
+// XLen-sized chunks that are 0. This is appropriate to use in instances when
+// the zero register can be used, e.g. when estimating the cost of
+// materializing a value used by a particular operation.
 int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
-                  bool CompressionCost = false);
+                  bool CompressionCost = false, bool FreeZeroes = false);
 } // namespace RISCVMatInt
 } // namespace llvm
 #endif
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 176d0e79253ac..17b3ce01f3cc8 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -120,7 +120,9 @@ InstructionCost RISCVTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
 
   // Otherwise, we check how many instructions it will take to materialise.
   const DataLayout &DL = getDataLayout();
-  return RISCVMatInt::getIntMatCost(Imm, DL.getTypeSizeInBits(Ty), *getST());
+  return RISCVMatInt::getIntMatCost(Imm, DL.getTypeSizeInBits(Ty), *getST(),
+                                    /*CompressionCost=*/false,
+                                    /*FreeZeroes=*/true);
 }
 
 // Look for patterns of shift followed by AND that can be turned into a pair of
@@ -172,11 +174,22 @@ InstructionCost RISCVTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
     // split up large offsets in GEP into better parts than ConstantHoisting
     // can.
     return TTI::TCC_Free;
-  case Instruction::Store:
-    // If the address is a constant, use the materialization cost.
-    if (Idx == 1)
+  case Instruction::Store: {
+    // Use the materialization cost regardless of if it's the address or the
+    // value that is constant, except for if the store is misaligned and
+    // misaligned accesses are not legal (experience shows constant hoisting
+    // can sometimes be harmful in such cases).
+    if (Idx == 1 || !Inst)
       return getIntImmCost(Imm, Ty, CostKind);
-    return TTI::TCC_Free;
+
+    StoreInst *ST = cast<StoreInst>(Inst);
+    if (!getTLI()->allowsMemoryAccessForAlignment(
+            Ty->getContext(), DL, getTLI()->getValueType(DL, Ty),
+            ST->getPointerAddressSpace(), ST->getAlign()))
+      return TTI::TCC_Free;
+
+    return getIntImmCost(Imm, Ty, CostKind);
+  }
   case Instruction::Load:
     // If the address is a constant, use the materialization cost.
     return getIntImmCost(Imm, Ty, CostKind);
diff --git a/llvm/test/CodeGen/RISCV/pr64935.ll b/llvm/test/CodeGen/RISCV/pr64935.ll
index 60be5fa6c994e..b712db0dc99d6 100644
--- a/llvm/test/CodeGen/RISCV/pr64935.ll
+++ b/llvm/test/CodeGen/RISCV/pr64935.ll
@@ -4,10 +4,7 @@
 define i1 @f() {
 ; CHECK-LABEL: f:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, 524288
-; CHECK-NEXT:    not a0, a0
-; CHECK-NEXT:    sltiu a0, a0, 2
-; CHECK-NEXT:    xori a0, a0, 1
+; CHECK-NEXT:    li a0, 1
 ; CHECK-NEXT:    ret
   %B25 = shl i64 4294967296, -9223372036854775808
   %B13 = sub i64 -1, -9223372036854775808
diff --git a/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll b/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll
index 8f57df6edb2c0..29df749e10059 100644
--- a/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll
+++ b/llvm/test/Transforms/ConstantHoisting/RISCV/immediates.ll
@@ -211,11 +211,12 @@ exit:
 
 ; Check that we use a common base for immediates needed by a store if the
 ; constants require more than 1 instruction.
-; TODO: This doesn't trigger currently.
 define void @test20(ptr %p1, ptr %p2) {
 ; CHECK-LABEL: test20
-; CHECK: store i32 15111111, ptr %p1
-; CHECK: store i32 15111112, ptr %p2
+; CHECK: %const = bitcast i32 15111111 to i32
+; CHECK: store i32 %const, ptr %p1, align 4
+; CHECK: %const_mat = add i32 %const, 1
+; CHECK: store i32 %const_mat, ptr %p2, align 4
   store i32 15111111, ptr %p1, align 4
   store i32 15111112, ptr %p2, align 4
   ret void

; CHECK-NEXT: not a0, a0
; CHECK-NEXT: sltiu a0, a0, 2
; CHECK-NEXT: xori a0, a0, 1
; CHECK-NEXT: li a0, 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the avoidance of doubt, this is semantically equivalent:
https://alive2.llvm.org/ce/z/8gY545

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still generate an opaque constant needed for the original bug?

@@ -120,7 +120,9 @@ InstructionCost RISCVTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,

// Otherwise, we check how many instructions it will take to materialise.
const DataLayout &DL = getDataLayout();
return RISCVMatInt::getIntMatCost(Imm, DL.getTypeSizeInBits(Ty), *getST());
return RISCVMatInt::getIntMatCost(Imm, DL.getTypeSizeInBits(Ty), *getST(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the special case at 118 is no longer required. Fine to leave for perf reasons, but no longer semantically relevant.

define void @test20(ptr %p1, ptr %p2) {
; CHECK-LABEL: test20
; CHECK: store i32 15111111, ptr %p1
; CHECK: store i32 15111112, ptr %p2
; CHECK: %const = bitcast i32 15111111 to i32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases for:

  • Imm=0 value with legal type
  • Imm=0 value with illegal type
  • Partially zero value with illegal type
  • Misaligned case

@@ -510,10 +510,12 @@ int getIntMatCost(const APInt &Val, unsigned Size, const MCSubtargetInfo &STI,
int Cost = 0;
for (unsigned ShiftVal = 0; ShiftVal < Size; ShiftVal += PlatRegSize) {
APInt Chunk = Val.ashr(ShiftVal).sextOrTrunc(PlatRegSize);
if (FreeZeroes && Chunk == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's smaller in code size to check for 0 using the int64_t returned from Chunk.getSExtValue() since there won't be code emitted for the non-small APInt case.

@asb asb force-pushed the 2024q2-riscv-getintimmcostinst-improvements-1 branch from e8715d2 to 513c783 Compare July 10, 2024 14:03
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

asb added 5 commits July 17, 2024 12:39
Previously getIntImmInstCost only calculated the cost of materialising
the argument of a store if it was the address. This means
ConstantHoisting's transformation wouldn't kick in for cases like
storing two values that require multiple instructions to materialise but
where one can be cheaply generated from the other (e.g. by an addition).
@asb asb force-pushed the 2024q2-riscv-getintimmcostinst-improvements-1 branch from 513c783 to 74e05dc Compare July 17, 2024 14:07
@asb
Copy link
Contributor Author

asb commented Jul 17, 2024

After some more testing I wasn't happy with the risk that the FreeZeroes change might negatively impact hoisting on other instructions unexpectedly (as it did for e.g. test7 in immediates.ll), so I've made a minor modification so that codepath is used only for the new load costing logic I added.

@asb asb merged commit 8687f7c into llvm:main Jul 17, 2024
4 of 6 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
Previously getIntImmInstCost only calculated the cost of materialising
the argument of a store if it was the address. This means
ConstantHoisting's transformation wouldn't kick in for cases like
storing two values that require multiple instructions to materialise but
where one can be cheaply generated from the other (e.g. by an addition).

Two key changes were needed to avoid regressions when enabling this:
* Allowing constant materialisation cost to be calculated assuming
zeroes are free (as might happen if you had a 2*XLEN constant and one
half is zero).
* Avoiding constant hoisting if we have a misaligned store that's going
to be a legalised to a sequence of narrower stores. I'm seeing cases
where hoisting the constant ends up with worse codegen in that case.

Out of caution and so as not to unexpectedly degrade other existing hoisting logic, FreeZeroes is used only for the new cost calculations for the load instruction. It would likely make sense to revisit this later.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251732
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants