Skip to content

[RISC-V][GISEL] Select G_BITCAST for scalable vectors #101486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jiahanxie353
Copy link
Contributor

Just got started to work on instruction selection after landing the regbankselect pass for scalable vector loads/stores.

Seems like selectImpl can capture loads/stores, but is not able to work for G_BITCAST out of the box.

Would a bitcast essentially being the same semantics as a copy? I was also looking at how AArch64 did it, seems like they are also just doing a copy? I'm not entirely sure.

Can I get some pointers? @topperc @michaelmaitland Thanks!

@michaelmaitland
Copy link
Contributor

According to RISCVDAGToADAGISel::Select, we should drop bitcasts between vectors if both are fixed or both are scalable.

@topperc
Copy link
Collaborator

topperc commented Aug 1, 2024

According to RISCVDAGToADAGISel::Select, we should drop bitcasts between vectors if both are fixed or both are scalable.

We should emit a COPY for GISel. Otherwise we have to change registers on other instructions to emit nothing.

@jiahanxie353
Copy link
Contributor Author

Is there a corresponding PseudosV* during instruction selection for load/storing a scalable vector of pointers?

@michaelmaitland
Copy link
Contributor

michaelmaitland commented Aug 5, 2024

Is there a corresponding PseudosV* during instruction selection for load/storing a scalable vector of pointers?

It may be useful to look at RISCVDAGToDAGISel::Select for ISD::LOAD to see what is done there.

Also in RISCVInstrInfoVSDPatterns.td there are some patterns for load/store.

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#sec-vector-memory might give you a better idea of what kind of vector loads we have. The goal is to pick the best kind of RVV load for a LLVM load instruciton.

Since pointers use GPR, I think that they use the normal patterns.

@michaelmaitland michaelmaitland self-requested a review September 10, 2024 21:04
@michaelmaitland
Copy link
Contributor

Need tests?

@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Jiahan Xie (jiahanxie353)

Changes

Just got started to work on instruction selection after landing the regbankselect pass for scalable vector loads/stores.

Seems like selectImpl can capture loads/stores, but is not able to work for G_BITCAST out of the box.

Would a bitcast essentially being the same semantics as a copy? I was also looking at how AArch64 did it, seems like they are also just doing a copy? I'm not entirely sure.

Can I get some pointers? @topperc @michaelmaitland Thanks!


Full diff: https://github.com/llvm/llvm-project/pull/101486.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp (+1)
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
index fdb1ebace00107..2243a22a167cab 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
@@ -559,6 +559,7 @@ bool RISCVInstructionSelector::select(MachineInstr &MI) {
   case TargetOpcode::G_INTTOPTR:
   case TargetOpcode::G_TRUNC:
   case TargetOpcode::G_FREEZE:
+  case TargetOpcode::G_BITCAST:
     return selectCopy(MI, MRI);
   case TargetOpcode::G_CONSTANT: {
     Register DstReg = MI.getOperand(0).getReg();

@jiahanxie353
Copy link
Contributor Author

jiahanxie353 commented Sep 16, 2024

Currently, we need to support load/store of a vector of pointers. For GISEL, we need to define some new patterns inRISCVInstrInfoVPseudos.td.

If I were to do it in a brute-force way, should we get something like the following (largely copied from AllIntegerVectors):

    def nxv1i32p0 = VTScalableVec<1, i32, 0>;
    def nxv2i32p0 = VTScalableVec<2, i32, 0>;
    def nxv4i32p0 = VTScalableVec<4, i32, 0>;
    def nxv8i32p0 = VTScalableVec<8, i32, 0>;
    def nxv1i64p0 = VTScalableVec<1, i64, 0>;
    def nxv2i64p0 = VTScalableVec<2, i64, 0>;
    def nxv4i64p0 = VTScalableVec<4, i64, 0>;
    def nxv8i64p0 = VTScalableVec<8, i64, 0>;

    defvar vptr32mf2_t = nxv1i32p0;
    defvar vptr32m1_t = nxv2i32p0;
    defvar vptr32m2_t = nxv4i32p0;
    defvar vptr32m4_t = nxv8i32p0;
    defvar vptr64m1_t = nxv1i64p0;
    defvar vptr64m2_t = nxv2i64p0;
    defvar vptr64m4_t = nxv4i64p0;
    defvar vptr64m8_t = nxv8i64p0;

    defset list<VTypeInfo> PtrVectors = {
      def VPTR32MF2: VTypeInfo<vptr32mf2_t, vbool64_t, 32, V_MF2>;
      def VPTR32M1: VTypeInfo<vptr32m1_t, vbool32_t, 32, V_M1>;
      def VPTR32M2: GroupVTypeInfo<vptr32m2_t, vptr32m1_t, vbool16_t, 32, V_M2>;
      def VPTR32M4: GroupVTypeInfo<vptr32m4_t, vptr32m1_t, vbool8_t, 32, V_M4>;

      def VPTR64M1: VTypeInfo<vptr64m1_t, vbool64_t, 64, V_M1>;
      def VPTR64M2: GroupVTypeInfo<vptr64m2_t, vptr64m1_t, vbool32_t, 64, V_M2>;
      def VPTR64M4: GroupVTypeInfo<vptr64m4_t, vptr64m1_t, vbool16_t, 64, V_M4>;
      def VPTR64M8: GroupVTypeInfo<vptr64m8_t, vptr64m1_t, vbool8_t, 64, V_M8>;
    }
  }

did I write the patterns correctly?

And then in RISCVGISEL.td:

let Predicates = [IsRV32] in {
foreach vti = PtrVectors in
  def VPatUSLoadStoreSDNode<vti.Vector, vti.Log2SEW, vti.LMul, vti.AVL, vti.RegClass>;
}

let Predicates = [IsRV64] in ...

Even if I wrote the patterns correctly, the issue is that we should really have XLen in place of explicit i32/i64:

    def p0XLen : PtrValueType<XLenVT, 0>;
    def  nxv1p0 = VTScalableVec<1, p0XLen, 0>;
    def  nxv2p0 = VTScalableVec<2, p0XLen, 0>;
    def  nxv4p0 = VTScalableVec<4, p0XLen, 0>;
    def  nxv8p0 = VTScalableVec<8, p0XLen, 0>;

And how should we use XLen uniformly instead of brute-force 32-bit and 64-bit machine separately?

@topperc
Copy link
Collaborator

topperc commented Sep 16, 2024

Currently, we need to support load/store of a vector of pointers. For GISEL, we need to define some new patterns inRISCVInstrInfoVPseudos.td.

If I were to do it in a brute-force way, should we get something like the following (largely copied from AllIntegerVectors):

    def nxv1i32p0 = VTScalableVec<1, i32, 0>;
    def nxv2i32p0 = VTScalableVec<2, i32, 0>;
    def nxv4i32p0 = VTScalableVec<4, i32, 0>;
    def nxv8i32p0 = VTScalableVec<8, i32, 0>;
    def nxv1i64p0 = VTScalableVec<1, i64, 0>;
    def nxv2i64p0 = VTScalableVec<2, i64, 0>;
    def nxv4i64p0 = VTScalableVec<4, i64, 0>;
    def nxv8i64p0 = VTScalableVec<8, i64, 0>;

    defvar vptr32mf2_t = nxv1i32p0;
    defvar vptr32m1_t = nxv2i32p0;
    defvar vptr32m2_t = nxv4i32p0;
    defvar vptr32m4_t = nxv8i32p0;
    defvar vptr64m1_t = nxv1i64p0;
    defvar vptr64m2_t = nxv2i64p0;
    defvar vptr64m4_t = nxv4i64p0;
    defvar vptr64m8_t = nxv8i64p0;

    defset list<VTypeInfo> PtrVectors = {
      def VPTR32MF2: VTypeInfo<vptr32mf2_t, vbool64_t, 32, V_MF2>;
      def VPTR32M1: VTypeInfo<vptr32m1_t, vbool32_t, 32, V_M1>;
      def VPTR32M2: GroupVTypeInfo<vptr32m2_t, vptr32m1_t, vbool16_t, 32, V_M2>;
      def VPTR32M4: GroupVTypeInfo<vptr32m4_t, vptr32m1_t, vbool8_t, 32, V_M4>;

      def VPTR64M1: VTypeInfo<vptr64m1_t, vbool64_t, 64, V_M1>;
      def VPTR64M2: GroupVTypeInfo<vptr64m2_t, vptr64m1_t, vbool32_t, 64, V_M2>;
      def VPTR64M4: GroupVTypeInfo<vptr64m4_t, vptr64m1_t, vbool16_t, 64, V_M4>;
      def VPTR64M8: GroupVTypeInfo<vptr64m8_t, vptr64m1_t, vbool8_t, 64, V_M8>;
    }
  }

did I write the patterns correctly?

And then in RISCVGISEL.td:

let Predicates = [IsRV32] in {
foreach vti = PtrVectors in
  def VPatUSLoadStoreSDNode<vti.Vector, vti.Log2SEW, vti.LMul, vti.AVL, vti.RegClass>;
}

let Predicates = [IsRV64] in ...

Even if I wrote the patterns correctly, the issue is that we should really have XLen in place of explicit i32/i64:

    def p0XLen : PtrValueType<XLenVT, 0>;
    def  nxv1p0 = VTScalableVec<1, p0XLen, 0>;
    def  nxv2p0 = VTScalableVec<2, p0XLen, 0>;
    def  nxv4p0 = VTScalableVec<4, p0XLen, 0>;
    def  nxv8p0 = VTScalableVec<8, p0XLen, 0>;

And how should we use XLen uniformly instead of brute-force 32-bit and 64-bit machine separately?

AArch64 handles scalar load/stores of pointers in AArch64InstructionSelector::preISelLower by forcing the type to integer. Can we do something similar?

@jiahanxie353
Copy link
Contributor Author

Seems like selectImpl can capture loads/stores,

I was wrong, selectImpl only supports rv32 for loading this example but not rv64.. it seems pretty weird to me since the table tells us it should work with rv64 also, right?

Copy link

github-actions bot commented Sep 23, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 81c3499531c3fe03827bd8bc890e3a16db9e4c3c 057c06a585b528be37b735feac47e73e8cda1389 --extensions cpp -- llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
index 78fffa5047..e484d2af15 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
@@ -814,7 +814,9 @@ void RISCVInstructionSelector::preISelLower(MachineInstr &MI,
         "through preISelLower")
       break;
     const LLT sXLen = LLT::scalar(STI.getXLen());
-    auto Copy = MIB.buildCopy(LLT::scalable_vector(SrcTy.getElementCount().getKnownMinValue(), sXLen), SrcOp);
+    auto Copy = MIB.buildCopy(
+        LLT::scalable_vector(SrcTy.getElementCount().getKnownMinValue(), sXLen),
+        SrcOp);
     Register NewSrc = Copy.getReg(0);
     SrcOp.setReg(NewSrc);
   }

@michaelmaitland
Copy link
Contributor

Could you update PR description and title please?

Copy link
Contributor

@michaelmaitland michaelmaitland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into why this is failing for rv64.

@@ -559,6 +559,7 @@ bool RISCVInstructionSelector::select(MachineInstr &MI) {
case TargetOpcode::G_INTTOPTR:
case TargetOpcode::G_TRUNC:
case TargetOpcode::G_FREEZE:
case TargetOpcode::G_BITCAST:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to need G_BITCAST instruction-select tests due to this line

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=riscv32 -mattr=+v -run-pass=instruction-select -simplify-mir -verify-machineinstrs %s -o - | FileCheck -check-prefix=RV32I %s
# RUN: llc -mtriple=riscv64 -mattr=+v -run-pass=instruction-select -simplify-mir -verify-machineinstrs %s -o - | FileCheck -check-prefix=RV64I %s
--- |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to drop the LLVM IR part of this file (i.e. what is between this ---| and ...)? You will probably need to rename bb.1 (%ir-block.0): to bb.1: in the corresponding MIR functions below since we removed the LLVM IR.

case TargetOpcode::G_LOAD: {
Register DstReg = MI.getOperand(0).getReg();
const LLT DstTy = MRI.getType(DstReg);
if (!(DstTy.isVector() && DstTy.getElementType().isPointer()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify to !DestTy.isVector() || !DstTy.isPointerVector().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would !DstTy.isPointerVector() suffice since that also checks if DestTy is a Vector or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better!

case TargetOpcode::G_STORE: {
MachineOperand &SrcOp = MI.getOperand(0);
const LLT SrcTy = MRI.getType(SrcOp.getReg());
if (!(SrcTy.isVector() && SrcTy.getElementType().isPointer()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify to !SrcTy.isVector() || !SrcTy.isPointerVector().

if (!(SrcTy.isVector() && SrcTy.getElementType().isPointer()))
break;
const LLT sXLen = LLT::scalar(STI.getXLen());
auto Copy = MIB.buildCopy(LLT::scalable_vector(SrcTy.getElementCount().getKnownMinValue(), sXLen), SrcOp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do a copy instead of changing the store's vector pointer element type to vector of sXLen type?

Copy link
Collaborator

@topperc topperc Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pointer has p0 type. we need to change the type of the data to store, but that type is owned by the instruction that produces the data. so you can't change it. if you were to change it that instruction would fail instruction selection when we visit it.

@@ -785,6 +786,26 @@ void RISCVInstructionSelector::preISelLower(MachineInstr &MI,
replacePtrWithInt(MI.getOperand(1), MIB, MRI);
MI.setDesc(TII.get(TargetOpcode::G_AND));
MRI.setType(DstReg, sXLen);
break;
}
case TargetOpcode::G_LOAD: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper function called replacePtrWithInt. Maybe you can expand that to support vectors instead of doing it here in the code manually?

@topperc
Copy link
Collaborator

topperc commented Sep 23, 2024

I am looking into why this is failing for rv64.

It's failing because only the RV32 pattern exists in lib/Target/RISCV/RISCVGenGlobalISel.inc

@michaelmaitland
Copy link
Contributor

michaelmaitland commented Sep 23, 2024

It looks like it is failing 64543: GIM_CheckFeatures(ExpectedBitsetID=47). Based on what I can tell, that occurs when we don't have the correct subtarget features. I thought you have specified the correct subtarget features though (via mtriple and mattr). I am continuing to look into it.

EDIT: Just saw what @topperc posted, I guess it has to do with that.

@topperc
Copy link
Collaborator

topperc commented Sep 23, 2024

I am looking into why this is failing for rv64.

It's failing because only the RV32 pattern exists in lib/Target/RISCV/RISCVGenGlobalISel.inc

Should be fixed after 9d3a576

"Operand must be a pointer of a vector of pointers");
assert(PtrTy.isScalableVector() &&
"Currently only working for scalable vector of pointers now");
MRI.setType(PtrReg, LLT::scalable_vector(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper function called replacePtrWithInt. Maybe you can expand that to support vectors instead of doing it here in the code manually?

Not sure if makes sense to inplace update and return true, instead of building a new instruction like G_PTRTOINT. I didn't do it because there is no generic opcode for it and we have to go through extract_elment and apply each element with G_PTRTOINT which feels very cumbersome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like AArch64 changes the type without building a PtrToInt. They say it works because all users have been selected already (bottom up selection) so the type does not matter for them. I think we can do the same thing, which is what you have done here.

I wonder if we really needed to build the PtrToInt above for the same reason. I can take a look at that in a follow up patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the vector and non-vector code do something wildly different they should not be in the same function.

"through preISelLower")
break;
const LLT sXLen = LLT::scalar(STI.getXLen());
auto Copy = MIB.buildCopy(LLT::scalable_vector(SrcTy.getElementCount().getKnownMinValue(), sXLen), SrcOp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clang format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants