-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Port NVPTXTargetLowering::LowerCONCAT_VECTORS
to SelectionDAG
#120030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
f4ccd5d
to
621fa58
Compare
@justinfargnoli this is still a work in progress, but would you be willing to look it over to make sure I'm going in the right direction?
|
This looks great! Check out this documentation for details on how to fix the test failures. |
I did see this, is it ok to just overwrite test cases like this? I'm worried I may break something I don't understand. |
Yes, we will review the diff to ensure it's okay before merging the change! |
621fa58
to
67ecdf0
Compare
@llvm/pr-subscribers-backend-nvptx @llvm/pr-subscribers-backend-aarch64 Author: Ethan Kaji (Esan5) ChangesPorts Full diff: https://github.com/llvm/llvm-project/pull/120030.diff 6 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index ca87168929f964..09025ae4d71ab9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -191,6 +191,7 @@ class SelectionDAGLegalize {
SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
SDValue ExpandInsertToVectorThroughStack(SDValue Op);
SDValue ExpandVectorBuildThroughStack(SDNode* Node);
+ SDValue ExpandConcatVectors(SDNode* Node);
SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);
SDValue ExpandConstant(ConstantSDNode *CP);
@@ -1517,10 +1518,27 @@ SDValue SelectionDAGLegalize::ExpandInsertToVectorThroughStack(SDValue Op) {
BaseVecAlignment);
}
+SDValue SelectionDAGLegalize::ExpandConcatVectors(SDNode *Node) {
+ assert(Node->getOpcode() == ISD::CONCAT_VECTORS && "Unexpected opcode!");
+ SDLoc Dl(Node);
+ SmallVector<SDValue, 0> Ops;
+ unsigned NumOperands = Node->getNumOperands();
+ for (unsigned I = 0; I < NumOperands; ++I) {
+ SDValue SubOp = Node->getOperand(I);
+ EVT VectorValueType =
+ SubOp->getValueType(0);
+ EVT ElementValueType = VectorValueType.getVectorElementType();
+ unsigned NumSubElem = VectorValueType.getVectorNumElements();
+ for (unsigned J = 0; J < NumSubElem; ++J) {
+ Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, Dl, ElementValueType,
+ SubOp, DAG.getIntPtrConstant(J, Dl)));
+ }
+ }
+ return DAG.getBuildVector(Node->getValueType(0), Dl, Ops);
+}
+
SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
- assert((Node->getOpcode() == ISD::BUILD_VECTOR ||
- Node->getOpcode() == ISD::CONCAT_VECTORS) &&
- "Unexpected opcode!");
+ assert(Node->getOpcode() == ISD::BUILD_VECTOR && "Unexpected opcode!");
// We can't handle this case efficiently. Allocate a sufficiently
// aligned object on the stack, store each operand into it, then load
@@ -3371,7 +3389,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0)));
break;
case ISD::CONCAT_VECTORS:
- Results.push_back(ExpandVectorBuildThroughStack(Node));
+ Results.push_back(ExpandConcatVectors(Node));
break;
case ISD::SCALAR_TO_VECTOR:
Results.push_back(ExpandSCALAR_TO_VECTOR(Node));
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index a02607efb7fc28..de06763da4dbbe 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -2001,28 +2001,6 @@ SDValue NVPTXTargetLowering::LowerSTACKSAVE(SDValue Op,
return DAG.getMergeValues({ASC, SDValue(SS.getNode(), 1)}, DL);
}
-// By default CONCAT_VECTORS is lowered by ExpandVectorBuildThroughStack()
-// (see LegalizeDAG.cpp). This is slow and uses local memory.
-// We use extract/insert/build vector just as what LegalizeOp() does in llvm 2.5
-SDValue
-NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const {
- SDNode *Node = Op.getNode();
- SDLoc dl(Node);
- SmallVector<SDValue, 8> Ops;
- unsigned NumOperands = Node->getNumOperands();
- for (unsigned i = 0; i < NumOperands; ++i) {
- SDValue SubOp = Node->getOperand(i);
- EVT VVT = SubOp.getNode()->getValueType(0);
- EVT EltVT = VVT.getVectorElementType();
- unsigned NumSubElem = VVT.getVectorNumElements();
- for (unsigned j = 0; j < NumSubElem; ++j) {
- Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, EltVT, SubOp,
- DAG.getIntPtrConstant(j, dl)));
- }
- }
- return DAG.getBuildVector(Node->getValueType(0), dl, Ops);
-}
-
SDValue NVPTXTargetLowering::LowerBITCAST(SDValue Op, SelectionDAG &DAG) const {
// Handle bitcasting from v2i8 without hitting the default promotion
// strategy which goes through stack memory.
@@ -2565,8 +2543,6 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::VECTOR_SHUFFLE:
return LowerVECTOR_SHUFFLE(Op, DAG);
- case ISD::CONCAT_VECTORS:
- return LowerCONCAT_VECTORS(Op, DAG);
case ISD::STORE:
return LowerSTORE(Op, DAG);
case ISD::LOAD:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 0244a0c5bec9d5..6523f22777693e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -266,7 +266,6 @@ class NVPTXTargetLowering : public TargetLowering {
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
- SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
index 619840fc6afb28..ff6e1dbb3baa02 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
@@ -70,9 +70,12 @@ define <16 x i8> @concat_v16i8(<8 x i8> %op1, <8 x i8> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v16i8:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT: stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <8 x i8> %op1, <8 x i8> %op2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
@@ -181,9 +184,12 @@ define <8 x i16> @concat_v8i16(<4 x i16> %op1, <4 x i16> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v8i16:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT: stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <4 x i16> %op1, <4 x i16> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i16> %res
@@ -279,9 +285,14 @@ define <4 x i32> @concat_v4i32(<2 x i32> %op1, <2 x i32> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v4i32:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT: stp w8, w9, [sp, #24]
+; NONEON-NOSVE-NEXT: ldp w8, w9, [sp]
+; NONEON-NOSVE-NEXT: stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <2 x i32> %op1, <2 x i32> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i32> %res
@@ -441,9 +452,12 @@ define <8 x half> @concat_v8f16(<4 x half> %op1, <4 x half> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v8f16:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT: stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <4 x half> %op1, <4 x half> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x half> %res
@@ -539,9 +553,14 @@ define <4 x float> @concat_v4f32(<2 x float> %op1, <2 x float> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v4f32:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp s0, s1, [sp, #8]
+; NONEON-NOSVE-NEXT: stp s0, s1, [sp, #24]
+; NONEON-NOSVE-NEXT: ldp s0, s1, [sp]
+; NONEON-NOSVE-NEXT: stp s0, s1, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <2 x float> %op1, <2 x float> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x float> %res
@@ -754,12 +773,15 @@ define void @concat_v32i8_4op(ptr %a, ptr %b) {
;
; NONEON-NOSVE-LABEL: concat_v32i8_4op:
; NONEON-NOSVE: // %bb.0:
+; NONEON-NOSVE-NEXT: sub sp, sp, #32
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
; NONEON-NOSVE-NEXT: ldr d0, [x0]
-; NONEON-NOSVE-NEXT: str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp]
+; NONEON-NOSVE-NEXT: str d0, [sp, #8]
+; NONEON-NOSVE-NEXT: ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT: str x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
; NONEON-NOSVE-NEXT: str q0, [x1]
-; NONEON-NOSVE-NEXT: add sp, sp, #16
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%op1 = load <8 x i8>, ptr %a
%shuffle = shufflevector <8 x i8> %op1, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
@@ -781,12 +803,15 @@ define void @concat_v16i16_4op(ptr %a, ptr %b) {
;
; NONEON-NOSVE-LABEL: concat_v16i16_4op:
; NONEON-NOSVE: // %bb.0:
+; NONEON-NOSVE-NEXT: sub sp, sp, #32
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
; NONEON-NOSVE-NEXT: ldr d0, [x0]
-; NONEON-NOSVE-NEXT: str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp]
+; NONEON-NOSVE-NEXT: str d0, [sp, #8]
+; NONEON-NOSVE-NEXT: ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT: str x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
; NONEON-NOSVE-NEXT: str q0, [x1]
-; NONEON-NOSVE-NEXT: add sp, sp, #16
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%op1 = load <4 x i16>, ptr %a
%shuffle = shufflevector <4 x i16> %op1, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -805,12 +830,15 @@ define void @concat_v8i32_4op(ptr %a, ptr %b) {
;
; NONEON-NOSVE-LABEL: concat_v8i32_4op:
; NONEON-NOSVE: // %bb.0:
+; NONEON-NOSVE-NEXT: sub sp, sp, #32
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
; NONEON-NOSVE-NEXT: ldr d0, [x0]
-; NONEON-NOSVE-NEXT: str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp]
+; NONEON-NOSVE-NEXT: str d0, [sp, #8]
+; NONEON-NOSVE-NEXT: ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT: stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
; NONEON-NOSVE-NEXT: str q0, [x1]
-; NONEON-NOSVE-NEXT: add sp, sp, #16
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%op1 = load <2 x i32>, ptr %a
%shuffle = shufflevector <2 x i32> %op1, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
index b0fdce9a93bd3b..fbdfc4b7d96cf3 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
@@ -1138,17 +1138,15 @@ define <2 x i64> @smulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
;
; NONEON-NOSVE-LABEL: smulh_v2i64:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT: sub sp, sp, #64
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
; NONEON-NOSVE-NEXT: smulh x8, x8, x10
; NONEON-NOSVE-NEXT: smulh x9, x9, x11
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT: ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT: add sp, sp, #64
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #64
; NONEON-NOSVE-NEXT: ret
%1 = sext <2 x i64> %op1 to <2 x i128>
%2 = sext <2 x i64> %op2 to <2 x i128>
@@ -1185,23 +1183,19 @@ define void @smulh_v4i64(ptr %a, ptr %b) {
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 128
; NONEON-NOSVE-NEXT: ldp q1, q0, [x0]
; NONEON-NOSVE-NEXT: ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT: stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #64]
; NONEON-NOSVE-NEXT: smulh x10, x10, x12
-; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #80]
; NONEON-NOSVE-NEXT: smulh x11, x11, x13
; NONEON-NOSVE-NEXT: smulh x8, x8, x12
; NONEON-NOSVE-NEXT: smulh x9, x9, x14
-; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT: ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldp q1, q0, [sp]
; NONEON-NOSVE-NEXT: stp q0, q1, [x0]
; NONEON-NOSVE-NEXT: add sp, sp, #128
; NONEON-NOSVE-NEXT: ret
@@ -2339,17 +2333,15 @@ define <2 x i64> @umulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
;
; NONEON-NOSVE-LABEL: umulh_v2i64:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT: sub sp, sp, #64
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
; NONEON-NOSVE-NEXT: umulh x8, x8, x10
; NONEON-NOSVE-NEXT: umulh x9, x9, x11
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT: ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT: add sp, sp, #64
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #64
; NONEON-NOSVE-NEXT: ret
%1 = zext <2 x i64> %op1 to <2 x i128>
%2 = zext <2 x i64> %op2 to <2 x i128>
@@ -2386,23 +2378,19 @@ define void @umulh_v4i64(ptr %a, ptr %b) {
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 128
; NONEON-NOSVE-NEXT: ldp q1, q0, [x0]
; NONEON-NOSVE-NEXT: ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT: stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #64]
; NONEON-NOSVE-NEXT: umulh x10, x10, x12
-; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #80]
; NONEON-NOSVE-NEXT: umulh x11, x11, x13
; NONEON-NOSVE-NEXT: umulh x8, x8, x12
; NONEON-NOSVE-NEXT: umulh x9, x9, x14
-; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT: ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldp q1, q0, [sp]
; NONEON-NOSVE-NEXT: stp q0, q1, [x0]
; NONEON-NOSVE-NEXT: add sp, sp, #128
; NONEON-NOSVE-NEXT: ret
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
index 13fcd94ea8a260..ae87128b5c3f9d 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
@@ -142,9 +142,7 @@ define void @store_trunc_v2i256i64(ptr %ap, ptr %dest) {
; NONEON-NOSVE-NEXT: ldr x9, [x0]
; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #-32]!
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #16]
-; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp]
; NONEON-NOSVE-NEXT: str q0, [x1]
; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
|
@llvm/pr-subscribers-llvm-selectiondag Author: Ethan Kaji (Esan5) ChangesPorts Full diff: https://github.com/llvm/llvm-project/pull/120030.diff 6 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index ca87168929f964..09025ae4d71ab9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -191,6 +191,7 @@ class SelectionDAGLegalize {
SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
SDValue ExpandInsertToVectorThroughStack(SDValue Op);
SDValue ExpandVectorBuildThroughStack(SDNode* Node);
+ SDValue ExpandConcatVectors(SDNode* Node);
SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);
SDValue ExpandConstant(ConstantSDNode *CP);
@@ -1517,10 +1518,27 @@ SDValue SelectionDAGLegalize::ExpandInsertToVectorThroughStack(SDValue Op) {
BaseVecAlignment);
}
+SDValue SelectionDAGLegalize::ExpandConcatVectors(SDNode *Node) {
+ assert(Node->getOpcode() == ISD::CONCAT_VECTORS && "Unexpected opcode!");
+ SDLoc Dl(Node);
+ SmallVector<SDValue, 0> Ops;
+ unsigned NumOperands = Node->getNumOperands();
+ for (unsigned I = 0; I < NumOperands; ++I) {
+ SDValue SubOp = Node->getOperand(I);
+ EVT VectorValueType =
+ SubOp->getValueType(0);
+ EVT ElementValueType = VectorValueType.getVectorElementType();
+ unsigned NumSubElem = VectorValueType.getVectorNumElements();
+ for (unsigned J = 0; J < NumSubElem; ++J) {
+ Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, Dl, ElementValueType,
+ SubOp, DAG.getIntPtrConstant(J, Dl)));
+ }
+ }
+ return DAG.getBuildVector(Node->getValueType(0), Dl, Ops);
+}
+
SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
- assert((Node->getOpcode() == ISD::BUILD_VECTOR ||
- Node->getOpcode() == ISD::CONCAT_VECTORS) &&
- "Unexpected opcode!");
+ assert(Node->getOpcode() == ISD::BUILD_VECTOR && "Unexpected opcode!");
// We can't handle this case efficiently. Allocate a sufficiently
// aligned object on the stack, store each operand into it, then load
@@ -3371,7 +3389,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0)));
break;
case ISD::CONCAT_VECTORS:
- Results.push_back(ExpandVectorBuildThroughStack(Node));
+ Results.push_back(ExpandConcatVectors(Node));
break;
case ISD::SCALAR_TO_VECTOR:
Results.push_back(ExpandSCALAR_TO_VECTOR(Node));
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index a02607efb7fc28..de06763da4dbbe 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -2001,28 +2001,6 @@ SDValue NVPTXTargetLowering::LowerSTACKSAVE(SDValue Op,
return DAG.getMergeValues({ASC, SDValue(SS.getNode(), 1)}, DL);
}
-// By default CONCAT_VECTORS is lowered by ExpandVectorBuildThroughStack()
-// (see LegalizeDAG.cpp). This is slow and uses local memory.
-// We use extract/insert/build vector just as what LegalizeOp() does in llvm 2.5
-SDValue
-NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const {
- SDNode *Node = Op.getNode();
- SDLoc dl(Node);
- SmallVector<SDValue, 8> Ops;
- unsigned NumOperands = Node->getNumOperands();
- for (unsigned i = 0; i < NumOperands; ++i) {
- SDValue SubOp = Node->getOperand(i);
- EVT VVT = SubOp.getNode()->getValueType(0);
- EVT EltVT = VVT.getVectorElementType();
- unsigned NumSubElem = VVT.getVectorNumElements();
- for (unsigned j = 0; j < NumSubElem; ++j) {
- Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, EltVT, SubOp,
- DAG.getIntPtrConstant(j, dl)));
- }
- }
- return DAG.getBuildVector(Node->getValueType(0), dl, Ops);
-}
-
SDValue NVPTXTargetLowering::LowerBITCAST(SDValue Op, SelectionDAG &DAG) const {
// Handle bitcasting from v2i8 without hitting the default promotion
// strategy which goes through stack memory.
@@ -2565,8 +2543,6 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::VECTOR_SHUFFLE:
return LowerVECTOR_SHUFFLE(Op, DAG);
- case ISD::CONCAT_VECTORS:
- return LowerCONCAT_VECTORS(Op, DAG);
case ISD::STORE:
return LowerSTORE(Op, DAG);
case ISD::LOAD:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 0244a0c5bec9d5..6523f22777693e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -266,7 +266,6 @@ class NVPTXTargetLowering : public TargetLowering {
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
- SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
index 619840fc6afb28..ff6e1dbb3baa02 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
@@ -70,9 +70,12 @@ define <16 x i8> @concat_v16i8(<8 x i8> %op1, <8 x i8> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v16i8:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT: stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <8 x i8> %op1, <8 x i8> %op2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
@@ -181,9 +184,12 @@ define <8 x i16> @concat_v8i16(<4 x i16> %op1, <4 x i16> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v8i16:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT: stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <4 x i16> %op1, <4 x i16> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i16> %res
@@ -279,9 +285,14 @@ define <4 x i32> @concat_v4i32(<2 x i32> %op1, <2 x i32> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v4i32:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT: stp w8, w9, [sp, #24]
+; NONEON-NOSVE-NEXT: ldp w8, w9, [sp]
+; NONEON-NOSVE-NEXT: stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <2 x i32> %op1, <2 x i32> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i32> %res
@@ -441,9 +452,12 @@ define <8 x half> @concat_v8f16(<4 x half> %op1, <4 x half> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v8f16:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT: stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <4 x half> %op1, <4 x half> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x half> %res
@@ -539,9 +553,14 @@ define <4 x float> @concat_v4f32(<2 x float> %op1, <2 x float> %op2) {
;
; NONEON-NOSVE-LABEL: concat_v4f32:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT: ldp s0, s1, [sp, #8]
+; NONEON-NOSVE-NEXT: stp s0, s1, [sp, #24]
+; NONEON-NOSVE-NEXT: ldp s0, s1, [sp]
+; NONEON-NOSVE-NEXT: stp s0, s1, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%res = shufflevector <2 x float> %op1, <2 x float> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x float> %res
@@ -754,12 +773,15 @@ define void @concat_v32i8_4op(ptr %a, ptr %b) {
;
; NONEON-NOSVE-LABEL: concat_v32i8_4op:
; NONEON-NOSVE: // %bb.0:
+; NONEON-NOSVE-NEXT: sub sp, sp, #32
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
; NONEON-NOSVE-NEXT: ldr d0, [x0]
-; NONEON-NOSVE-NEXT: str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp]
+; NONEON-NOSVE-NEXT: str d0, [sp, #8]
+; NONEON-NOSVE-NEXT: ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT: str x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
; NONEON-NOSVE-NEXT: str q0, [x1]
-; NONEON-NOSVE-NEXT: add sp, sp, #16
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%op1 = load <8 x i8>, ptr %a
%shuffle = shufflevector <8 x i8> %op1, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
@@ -781,12 +803,15 @@ define void @concat_v16i16_4op(ptr %a, ptr %b) {
;
; NONEON-NOSVE-LABEL: concat_v16i16_4op:
; NONEON-NOSVE: // %bb.0:
+; NONEON-NOSVE-NEXT: sub sp, sp, #32
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
; NONEON-NOSVE-NEXT: ldr d0, [x0]
-; NONEON-NOSVE-NEXT: str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp]
+; NONEON-NOSVE-NEXT: str d0, [sp, #8]
+; NONEON-NOSVE-NEXT: ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT: str x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
; NONEON-NOSVE-NEXT: str q0, [x1]
-; NONEON-NOSVE-NEXT: add sp, sp, #16
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%op1 = load <4 x i16>, ptr %a
%shuffle = shufflevector <4 x i16> %op1, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -805,12 +830,15 @@ define void @concat_v8i32_4op(ptr %a, ptr %b) {
;
; NONEON-NOSVE-LABEL: concat_v8i32_4op:
; NONEON-NOSVE: // %bb.0:
+; NONEON-NOSVE-NEXT: sub sp, sp, #32
+; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
; NONEON-NOSVE-NEXT: ldr d0, [x0]
-; NONEON-NOSVE-NEXT: str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT: ldr q0, [sp]
+; NONEON-NOSVE-NEXT: str d0, [sp, #8]
+; NONEON-NOSVE-NEXT: ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT: stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
; NONEON-NOSVE-NEXT: str q0, [x1]
-; NONEON-NOSVE-NEXT: add sp, sp, #16
+; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
%op1 = load <2 x i32>, ptr %a
%shuffle = shufflevector <2 x i32> %op1, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
index b0fdce9a93bd3b..fbdfc4b7d96cf3 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
@@ -1138,17 +1138,15 @@ define <2 x i64> @smulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
;
; NONEON-NOSVE-LABEL: smulh_v2i64:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT: sub sp, sp, #64
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
; NONEON-NOSVE-NEXT: smulh x8, x8, x10
; NONEON-NOSVE-NEXT: smulh x9, x9, x11
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT: ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT: add sp, sp, #64
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #64
; NONEON-NOSVE-NEXT: ret
%1 = sext <2 x i64> %op1 to <2 x i128>
%2 = sext <2 x i64> %op2 to <2 x i128>
@@ -1185,23 +1183,19 @@ define void @smulh_v4i64(ptr %a, ptr %b) {
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 128
; NONEON-NOSVE-NEXT: ldp q1, q0, [x0]
; NONEON-NOSVE-NEXT: ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT: stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #64]
; NONEON-NOSVE-NEXT: smulh x10, x10, x12
-; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #80]
; NONEON-NOSVE-NEXT: smulh x11, x11, x13
; NONEON-NOSVE-NEXT: smulh x8, x8, x12
; NONEON-NOSVE-NEXT: smulh x9, x9, x14
-; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT: ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldp q1, q0, [sp]
; NONEON-NOSVE-NEXT: stp q0, q1, [x0]
; NONEON-NOSVE-NEXT: add sp, sp, #128
; NONEON-NOSVE-NEXT: ret
@@ -2339,17 +2333,15 @@ define <2 x i64> @umulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
;
; NONEON-NOSVE-LABEL: umulh_v2i64:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT: sub sp, sp, #64
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
; NONEON-NOSVE-NEXT: umulh x8, x8, x10
; NONEON-NOSVE-NEXT: umulh x9, x9, x11
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT: ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT: add sp, sp, #64
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #64
; NONEON-NOSVE-NEXT: ret
%1 = zext <2 x i64> %op1 to <2 x i128>
%2 = zext <2 x i64> %op2 to <2 x i128>
@@ -2386,23 +2378,19 @@ define void @umulh_v4i64(ptr %a, ptr %b) {
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 128
; NONEON-NOSVE-NEXT: ldp q1, q0, [x0]
; NONEON-NOSVE-NEXT: ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT: stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT: ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT: ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT: stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT: ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x9, x8, [sp, #64]
; NONEON-NOSVE-NEXT: umulh x10, x10, x12
-; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT: ldp x14, x12, [sp, #80]
; NONEON-NOSVE-NEXT: umulh x11, x11, x13
; NONEON-NOSVE-NEXT: umulh x8, x8, x12
; NONEON-NOSVE-NEXT: umulh x9, x9, x14
-; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT: ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT: stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT: stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT: ldp q1, q0, [sp]
; NONEON-NOSVE-NEXT: stp q0, q1, [x0]
; NONEON-NOSVE-NEXT: add sp, sp, #128
; NONEON-NOSVE-NEXT: ret
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
index 13fcd94ea8a260..ae87128b5c3f9d 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
@@ -142,9 +142,7 @@ define void @store_trunc_v2i256i64(ptr %ap, ptr %dest) {
; NONEON-NOSVE-NEXT: ldr x9, [x0]
; NONEON-NOSVE-NEXT: stp x9, x8, [sp, #-32]!
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 32
-; NONEON-NOSVE-NEXT: ldp d0, d1, [sp]
-; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #16]
-; NONEON-NOSVE-NEXT: ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT: ldr q0, [sp]
; NONEON-NOSVE-NEXT: str q0, [x1]
; NONEON-NOSVE-NEXT: add sp, sp, #32
; NONEON-NOSVE-NEXT: ret
|
NVPTXTargetLowering::LowerCONCAT_VECTORS
to SelectionDAG
; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-16]! | ||
; NONEON-NOSVE-NEXT: .cfi_def_cfa_offset 16 | ||
; NONEON-NOSVE-NEXT: ldr q0, [sp], #16 | ||
; NONEON-NOSVE-NEXT: stp d0, d1, [sp, #-32]! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if aarch64 would prefer to keep the old stack path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case does seem worse. It's traded 2 stack operations for 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides a few formatting changes, I think this is almost ready to go!
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual change LGTM, but we probably need some kind of legality heuristic for which legalization path to prefer. If we know the extract_vector_elt will expand to the stack, we should probably let this directly lower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nit. Also I'm surprised you're getting away without updating the target legalize rules
@@ -2001,28 +2001,6 @@ SDValue NVPTXTargetLowering::LowerSTACKSAVE(SDValue Op, | |||
return DAG.getMergeValues({ASC, SDValue(SS.getNode(), 1)}, DL); | |||
} | |||
|
|||
// By default CONCAT_VECTORS is lowered by ExpandVectorBuildThroughStack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you also need to adjust the setOperationAction for cases that were setting Custom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be completely wrong, but I think they were removed in 2013 with be8dc64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was previously dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
@@ -3371,7 +3393,11 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) { | |||
Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0))); | |||
break; | |||
case ISD::CONCAT_VECTORS: | |||
Results.push_back(ExpandVectorBuildThroughStack(Node)); | |||
if (!TLI.isOperationExpand(ISD::EXTRACT_VECTOR_ELT, Node->getOperand(0).getValueType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need a guard against scalable vectors
@Esan5 please can you rebase/merge to trunk to check this patch hasn't rotted? |
4243d5f
to
1c9fa8e
Compare
Given that the original code was dead code, are we still interested in merging this? be8dc64 mentions this code was for a time before good target-independent support for scalarization was available. |
We'd still be interested in deleting the dead NVPTX code. However, since it doesn't look like there would be any users of Apologies for including |
It should still merge to the generic code. AMDGPU has essentially the same code in custom lowering (and I'm sure many out of tree targets do the same) |
Happy to hear that @Esan5's hard work won't go to waste! @Esan5, if you'd prefer, feel free to split off the NVPTX aspect of this PR into a separate PR. I'll promptly approve and merge it. |
4c4083f
to
c318b91
Compare
@Esan5 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/9046 Here is the relevant piece of the build log for the reference
|
Ports
NVPTXTargetLowering::LowerCONCAT_VECTORS
tollvm/lib/CodeGen/SelectionDAG
as requested in #116695.