Port `NVPTXTargetLowering::LowerCONCAT_VECTORS` to SelectionDAG #120030

Esan5 · 2024-12-16T02:02:07Z

Ports NVPTXTargetLowering::LowerCONCAT_VECTORS to llvm/lib/CodeGen/SelectionDAG as requested in #116695.

github-actions · 2024-12-16T02:02:28Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

Esan5 · 2024-12-16T02:42:31Z

@justinfargnoli this is still a work in progress, but would you be willing to look it over to make sure I'm going in the right direction?
The below tests are failing but I want some confirmation that this is in line with what you wanted before I dig deeper.

  LLVM :: CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
  LLVM :: CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
  LLVM :: CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll

justinfargnoli · 2024-12-16T23:39:26Z

This looks great!

Check out this documentation for details on how to fix the test failures.

Esan5 · 2024-12-16T23:41:33Z

I did see this, is it ok to just overwrite test cases like this? I'm worried I may break something I don't understand.

justinfargnoli · 2024-12-17T00:06:44Z

I did see this, is it ok to just overwrite test cases like this? I'm worried I may break something I don't understand.

Yes, we will review the diff to ensure it's okay before merging the change!

llvmbot · 2024-12-17T01:31:25Z

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-backend-aarch64

Author: Ethan Kaji (Esan5)

Changes

Ports NVPTXTargetLowering::LowerCONCAT_VECTORS to llvm/lib/CodeGen/SelectionDAG as requested in #116695.

Full diff: https://github.com/llvm/llvm-project/pull/120030.diff

6 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+22-4)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (-24)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (-1)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll (+55-27)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll (+30-42)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll (+1-3)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index ca87168929f964..09025ae4d71ab9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -191,6 +191,7 @@ class SelectionDAGLegalize {
   SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
   SDValue ExpandInsertToVectorThroughStack(SDValue Op);
   SDValue ExpandVectorBuildThroughStack(SDNode* Node);
+  SDValue ExpandConcatVectors(SDNode* Node);
 
   SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);
   SDValue ExpandConstant(ConstantSDNode *CP);
@@ -1517,10 +1518,27 @@ SDValue SelectionDAGLegalize::ExpandInsertToVectorThroughStack(SDValue Op) {
                      BaseVecAlignment);
 }
 
+SDValue SelectionDAGLegalize::ExpandConcatVectors(SDNode *Node) {
+  assert(Node->getOpcode() == ISD::CONCAT_VECTORS && "Unexpected opcode!");
+  SDLoc Dl(Node);
+  SmallVector<SDValue, 0> Ops;
+  unsigned NumOperands = Node->getNumOperands();
+  for (unsigned I = 0; I < NumOperands; ++I) {
+    SDValue SubOp = Node->getOperand(I);
+    EVT VectorValueType =
+        SubOp->getValueType(0);
+    EVT ElementValueType = VectorValueType.getVectorElementType();
+    unsigned NumSubElem = VectorValueType.getVectorNumElements();
+    for (unsigned J = 0; J < NumSubElem; ++J) {
+      Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, Dl, ElementValueType,
+                                SubOp, DAG.getIntPtrConstant(J, Dl)));
+    }
+  }
+  return DAG.getBuildVector(Node->getValueType(0), Dl, Ops);
+}
+
 SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
-  assert((Node->getOpcode() == ISD::BUILD_VECTOR ||
-          Node->getOpcode() == ISD::CONCAT_VECTORS) &&
-         "Unexpected opcode!");
+  assert(Node->getOpcode() == ISD::BUILD_VECTOR && "Unexpected opcode!");
 
   // We can't handle this case efficiently.  Allocate a sufficiently
   // aligned object on the stack, store each operand into it, then load
@@ -3371,7 +3389,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
     Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0)));
     break;
   case ISD::CONCAT_VECTORS:
-    Results.push_back(ExpandVectorBuildThroughStack(Node));
+    Results.push_back(ExpandConcatVectors(Node));
     break;
   case ISD::SCALAR_TO_VECTOR:
     Results.push_back(ExpandSCALAR_TO_VECTOR(Node));
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index a02607efb7fc28..de06763da4dbbe 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -2001,28 +2001,6 @@ SDValue NVPTXTargetLowering::LowerSTACKSAVE(SDValue Op,
   return DAG.getMergeValues({ASC, SDValue(SS.getNode(), 1)}, DL);
 }
 
-// By default CONCAT_VECTORS is lowered by ExpandVectorBuildThroughStack()
-// (see LegalizeDAG.cpp). This is slow and uses local memory.
-// We use extract/insert/build vector just as what LegalizeOp() does in llvm 2.5
-SDValue
-NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const {
-  SDNode *Node = Op.getNode();
-  SDLoc dl(Node);
-  SmallVector<SDValue, 8> Ops;
-  unsigned NumOperands = Node->getNumOperands();
-  for (unsigned i = 0; i < NumOperands; ++i) {
-    SDValue SubOp = Node->getOperand(i);
-    EVT VVT = SubOp.getNode()->getValueType(0);
-    EVT EltVT = VVT.getVectorElementType();
-    unsigned NumSubElem = VVT.getVectorNumElements();
-    for (unsigned j = 0; j < NumSubElem; ++j) {
-      Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, EltVT, SubOp,
-                                DAG.getIntPtrConstant(j, dl)));
-    }
-  }
-  return DAG.getBuildVector(Node->getValueType(0), dl, Ops);
-}
-
 SDValue NVPTXTargetLowering::LowerBITCAST(SDValue Op, SelectionDAG &DAG) const {
   // Handle bitcasting from v2i8 without hitting the default promotion
   // strategy which goes through stack memory.
@@ -2565,8 +2543,6 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
     return LowerINSERT_VECTOR_ELT(Op, DAG);
   case ISD::VECTOR_SHUFFLE:
     return LowerVECTOR_SHUFFLE(Op, DAG);
-  case ISD::CONCAT_VECTORS:
-    return LowerCONCAT_VECTORS(Op, DAG);
   case ISD::STORE:
     return LowerSTORE(Op, DAG);
   case ISD::LOAD:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 0244a0c5bec9d5..6523f22777693e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -266,7 +266,6 @@ class NVPTXTargetLowering : public TargetLowering {
   SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
 
   SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
-  SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
index 619840fc6afb28..ff6e1dbb3baa02 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
@@ -70,9 +70,12 @@ define <16 x i8> @concat_v16i8(<8 x i8> %op1, <8 x i8> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v16i8:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT:    stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <8 x i8> %op1, <8 x i8> %op2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
                                                                  i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
@@ -181,9 +184,12 @@ define <8 x i16> @concat_v8i16(<4 x i16> %op1, <4 x i16> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v8i16:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT:    stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <4 x i16> %op1, <4 x i16> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
   ret <8 x i16> %res
@@ -279,9 +285,14 @@ define <4 x i32> @concat_v4i32(<2 x i32> %op1, <2 x i32> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v4i32:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT:    stp w8, w9, [sp, #24]
+; NONEON-NOSVE-NEXT:    ldp w8, w9, [sp]
+; NONEON-NOSVE-NEXT:    stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <2 x i32> %op1, <2 x i32> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
   ret <4 x i32> %res
@@ -441,9 +452,12 @@ define <8 x half> @concat_v8f16(<4 x half> %op1, <4 x half> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v8f16:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT:    stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <4 x half> %op1, <4 x half> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
   ret <8 x half> %res
@@ -539,9 +553,14 @@ define <4 x float> @concat_v4f32(<2 x float> %op1, <2 x float> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v4f32:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp s0, s1, [sp, #8]
+; NONEON-NOSVE-NEXT:    stp s0, s1, [sp, #24]
+; NONEON-NOSVE-NEXT:    ldp s0, s1, [sp]
+; NONEON-NOSVE-NEXT:    stp s0, s1, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <2 x float> %op1, <2 x float> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
   ret <4 x float> %res
@@ -754,12 +773,15 @@ define void @concat_v32i8_4op(ptr %a, ptr %b)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v32i8_4op:
 ; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    sub sp, sp, #32
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
 ; NONEON-NOSVE-NEXT:    ldr d0, [x0]
-; NONEON-NOSVE-NEXT:    str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp]
+; NONEON-NOSVE-NEXT:    str d0, [sp, #8]
+; NONEON-NOSVE-NEXT:    ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT:    str x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
-; NONEON-NOSVE-NEXT:    add sp, sp, #16
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %op1 = load <8 x i8>, ptr %a
   %shuffle = shufflevector <8 x i8> %op1, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
@@ -781,12 +803,15 @@ define void @concat_v16i16_4op(ptr %a, ptr %b)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v16i16_4op:
 ; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    sub sp, sp, #32
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
 ; NONEON-NOSVE-NEXT:    ldr d0, [x0]
-; NONEON-NOSVE-NEXT:    str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp]
+; NONEON-NOSVE-NEXT:    str d0, [sp, #8]
+; NONEON-NOSVE-NEXT:    ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT:    str x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
-; NONEON-NOSVE-NEXT:    add sp, sp, #16
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %op1 = load <4 x i16>, ptr %a
   %shuffle = shufflevector <4 x i16> %op1, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -805,12 +830,15 @@ define void @concat_v8i32_4op(ptr %a, ptr %b)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v8i32_4op:
 ; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    sub sp, sp, #32
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
 ; NONEON-NOSVE-NEXT:    ldr d0, [x0]
-; NONEON-NOSVE-NEXT:    str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp]
+; NONEON-NOSVE-NEXT:    str d0, [sp, #8]
+; NONEON-NOSVE-NEXT:    ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT:    stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
-; NONEON-NOSVE-NEXT:    add sp, sp, #16
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %op1 = load <2 x i32>, ptr %a
   %shuffle = shufflevector <2 x i32> %op1, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
index b0fdce9a93bd3b..fbdfc4b7d96cf3 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
@@ -1138,17 +1138,15 @@ define <2 x i64> @smulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
 ;
 ; NONEON-NOSVE-LABEL: smulh_v2i64:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT:    sub sp, sp, #64
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
 ; NONEON-NOSVE-NEXT:    smulh x8, x8, x10
 ; NONEON-NOSVE-NEXT:    smulh x9, x9, x11
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT:    ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT:    add sp, sp, #64
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #64
 ; NONEON-NOSVE-NEXT:    ret
   %1 = sext <2 x i64> %op1 to <2 x i128>
   %2 = sext <2 x i64> %op2 to <2 x i128>
@@ -1185,23 +1183,19 @@ define void @smulh_v4i64(ptr %a, ptr %b) {
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 128
 ; NONEON-NOSVE-NEXT:    ldp q1, q0, [x0]
 ; NONEON-NOSVE-NEXT:    ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT:    stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #64]
 ; NONEON-NOSVE-NEXT:    smulh x10, x10, x12
-; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #80]
 ; NONEON-NOSVE-NEXT:    smulh x11, x11, x13
 ; NONEON-NOSVE-NEXT:    smulh x8, x8, x12
 ; NONEON-NOSVE-NEXT:    smulh x9, x9, x14
-; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT:    ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldp q1, q0, [sp]
 ; NONEON-NOSVE-NEXT:    stp q0, q1, [x0]
 ; NONEON-NOSVE-NEXT:    add sp, sp, #128
 ; NONEON-NOSVE-NEXT:    ret
@@ -2339,17 +2333,15 @@ define <2 x i64> @umulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
 ;
 ; NONEON-NOSVE-LABEL: umulh_v2i64:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT:    sub sp, sp, #64
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
 ; NONEON-NOSVE-NEXT:    umulh x8, x8, x10
 ; NONEON-NOSVE-NEXT:    umulh x9, x9, x11
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT:    ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT:    add sp, sp, #64
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #64
 ; NONEON-NOSVE-NEXT:    ret
   %1 = zext <2 x i64> %op1 to <2 x i128>
   %2 = zext <2 x i64> %op2 to <2 x i128>
@@ -2386,23 +2378,19 @@ define void @umulh_v4i64(ptr %a, ptr %b) {
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 128
 ; NONEON-NOSVE-NEXT:    ldp q1, q0, [x0]
 ; NONEON-NOSVE-NEXT:    ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT:    stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #64]
 ; NONEON-NOSVE-NEXT:    umulh x10, x10, x12
-; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #80]
 ; NONEON-NOSVE-NEXT:    umulh x11, x11, x13
 ; NONEON-NOSVE-NEXT:    umulh x8, x8, x12
 ; NONEON-NOSVE-NEXT:    umulh x9, x9, x14
-; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT:    ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldp q1, q0, [sp]
 ; NONEON-NOSVE-NEXT:    stp q0, q1, [x0]
 ; NONEON-NOSVE-NEXT:    add sp, sp, #128
 ; NONEON-NOSVE-NEXT:    ret
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
index 13fcd94ea8a260..ae87128b5c3f9d 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
@@ -142,9 +142,7 @@ define void @store_trunc_v2i256i64(ptr %ap, ptr %dest) {
 ; NONEON-NOSVE-NEXT:    ldr x9, [x0]
 ; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #-32]!
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #16]
-; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
 ; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret

llvmbot · 2024-12-17T01:31:25Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Ethan Kaji (Esan5)

Changes

Ports NVPTXTargetLowering::LowerCONCAT_VECTORS to llvm/lib/CodeGen/SelectionDAG as requested in #116695.

Full diff: https://github.com/llvm/llvm-project/pull/120030.diff

6 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+22-4)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (-24)
(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (-1)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll (+55-27)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll (+30-42)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll (+1-3)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index ca87168929f964..09025ae4d71ab9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -191,6 +191,7 @@ class SelectionDAGLegalize {
   SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
   SDValue ExpandInsertToVectorThroughStack(SDValue Op);
   SDValue ExpandVectorBuildThroughStack(SDNode* Node);
+  SDValue ExpandConcatVectors(SDNode* Node);
 
   SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);
   SDValue ExpandConstant(ConstantSDNode *CP);
@@ -1517,10 +1518,27 @@ SDValue SelectionDAGLegalize::ExpandInsertToVectorThroughStack(SDValue Op) {
                      BaseVecAlignment);
 }
 
+SDValue SelectionDAGLegalize::ExpandConcatVectors(SDNode *Node) {
+  assert(Node->getOpcode() == ISD::CONCAT_VECTORS && "Unexpected opcode!");
+  SDLoc Dl(Node);
+  SmallVector<SDValue, 0> Ops;
+  unsigned NumOperands = Node->getNumOperands();
+  for (unsigned I = 0; I < NumOperands; ++I) {
+    SDValue SubOp = Node->getOperand(I);
+    EVT VectorValueType =
+        SubOp->getValueType(0);
+    EVT ElementValueType = VectorValueType.getVectorElementType();
+    unsigned NumSubElem = VectorValueType.getVectorNumElements();
+    for (unsigned J = 0; J < NumSubElem; ++J) {
+      Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, Dl, ElementValueType,
+                                SubOp, DAG.getIntPtrConstant(J, Dl)));
+    }
+  }
+  return DAG.getBuildVector(Node->getValueType(0), Dl, Ops);
+}
+
 SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
-  assert((Node->getOpcode() == ISD::BUILD_VECTOR ||
-          Node->getOpcode() == ISD::CONCAT_VECTORS) &&
-         "Unexpected opcode!");
+  assert(Node->getOpcode() == ISD::BUILD_VECTOR && "Unexpected opcode!");
 
   // We can't handle this case efficiently.  Allocate a sufficiently
   // aligned object on the stack, store each operand into it, then load
@@ -3371,7 +3389,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
     Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0)));
     break;
   case ISD::CONCAT_VECTORS:
-    Results.push_back(ExpandVectorBuildThroughStack(Node));
+    Results.push_back(ExpandConcatVectors(Node));
     break;
   case ISD::SCALAR_TO_VECTOR:
     Results.push_back(ExpandSCALAR_TO_VECTOR(Node));
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index a02607efb7fc28..de06763da4dbbe 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -2001,28 +2001,6 @@ SDValue NVPTXTargetLowering::LowerSTACKSAVE(SDValue Op,
   return DAG.getMergeValues({ASC, SDValue(SS.getNode(), 1)}, DL);
 }
 
-// By default CONCAT_VECTORS is lowered by ExpandVectorBuildThroughStack()
-// (see LegalizeDAG.cpp). This is slow and uses local memory.
-// We use extract/insert/build vector just as what LegalizeOp() does in llvm 2.5
-SDValue
-NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const {
-  SDNode *Node = Op.getNode();
-  SDLoc dl(Node);
-  SmallVector<SDValue, 8> Ops;
-  unsigned NumOperands = Node->getNumOperands();
-  for (unsigned i = 0; i < NumOperands; ++i) {
-    SDValue SubOp = Node->getOperand(i);
-    EVT VVT = SubOp.getNode()->getValueType(0);
-    EVT EltVT = VVT.getVectorElementType();
-    unsigned NumSubElem = VVT.getVectorNumElements();
-    for (unsigned j = 0; j < NumSubElem; ++j) {
-      Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, EltVT, SubOp,
-                                DAG.getIntPtrConstant(j, dl)));
-    }
-  }
-  return DAG.getBuildVector(Node->getValueType(0), dl, Ops);
-}
-
 SDValue NVPTXTargetLowering::LowerBITCAST(SDValue Op, SelectionDAG &DAG) const {
   // Handle bitcasting from v2i8 without hitting the default promotion
   // strategy which goes through stack memory.
@@ -2565,8 +2543,6 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
     return LowerINSERT_VECTOR_ELT(Op, DAG);
   case ISD::VECTOR_SHUFFLE:
     return LowerVECTOR_SHUFFLE(Op, DAG);
-  case ISD::CONCAT_VECTORS:
-    return LowerCONCAT_VECTORS(Op, DAG);
   case ISD::STORE:
     return LowerSTORE(Op, DAG);
   case ISD::LOAD:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 0244a0c5bec9d5..6523f22777693e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -266,7 +266,6 @@ class NVPTXTargetLowering : public TargetLowering {
   SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
 
   SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
-  SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
index 619840fc6afb28..ff6e1dbb3baa02 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
@@ -70,9 +70,12 @@ define <16 x i8> @concat_v16i8(<8 x i8> %op1, <8 x i8> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v16i8:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT:    stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <8 x i8> %op1, <8 x i8> %op2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
                                                                  i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
@@ -181,9 +184,12 @@ define <8 x i16> @concat_v8i16(<4 x i16> %op1, <4 x i16> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v8i16:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT:    stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <4 x i16> %op1, <4 x i16> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
   ret <8 x i16> %res
@@ -279,9 +285,14 @@ define <4 x i32> @concat_v4i32(<2 x i32> %op1, <2 x i32> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v4i32:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT:    stp w8, w9, [sp, #24]
+; NONEON-NOSVE-NEXT:    ldp w8, w9, [sp]
+; NONEON-NOSVE-NEXT:    stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <2 x i32> %op1, <2 x i32> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
   ret <4 x i32> %res
@@ -441,9 +452,12 @@ define <8 x half> @concat_v8f16(<4 x half> %op1, <4 x half> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v8f16:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp x8, x9, [sp]
+; NONEON-NOSVE-NEXT:    stp x8, x9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <4 x half> %op1, <4 x half> %op2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
   ret <8 x half> %res
@@ -539,9 +553,14 @@ define <4 x float> @concat_v4f32(<2 x float> %op1, <2 x float> %op2)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v4f32:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
+; NONEON-NOSVE-NEXT:    ldp s0, s1, [sp, #8]
+; NONEON-NOSVE-NEXT:    stp s0, s1, [sp, #24]
+; NONEON-NOSVE-NEXT:    ldp s0, s1, [sp]
+; NONEON-NOSVE-NEXT:    stp s0, s1, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %res = shufflevector <2 x float> %op1, <2 x float> %op2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
   ret <4 x float> %res
@@ -754,12 +773,15 @@ define void @concat_v32i8_4op(ptr %a, ptr %b)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v32i8_4op:
 ; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    sub sp, sp, #32
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
 ; NONEON-NOSVE-NEXT:    ldr d0, [x0]
-; NONEON-NOSVE-NEXT:    str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp]
+; NONEON-NOSVE-NEXT:    str d0, [sp, #8]
+; NONEON-NOSVE-NEXT:    ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT:    str x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
-; NONEON-NOSVE-NEXT:    add sp, sp, #16
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %op1 = load <8 x i8>, ptr %a
   %shuffle = shufflevector <8 x i8> %op1, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
@@ -781,12 +803,15 @@ define void @concat_v16i16_4op(ptr %a, ptr %b)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v16i16_4op:
 ; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    sub sp, sp, #32
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
 ; NONEON-NOSVE-NEXT:    ldr d0, [x0]
-; NONEON-NOSVE-NEXT:    str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp]
+; NONEON-NOSVE-NEXT:    str d0, [sp, #8]
+; NONEON-NOSVE-NEXT:    ldr x8, [sp, #8]
+; NONEON-NOSVE-NEXT:    str x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
-; NONEON-NOSVE-NEXT:    add sp, sp, #16
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %op1 = load <4 x i16>, ptr %a
   %shuffle = shufflevector <4 x i16> %op1, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -805,12 +830,15 @@ define void @concat_v8i32_4op(ptr %a, ptr %b)  {
 ;
 ; NONEON-NOSVE-LABEL: concat_v8i32_4op:
 ; NONEON-NOSVE:       // %bb.0:
+; NONEON-NOSVE-NEXT:    sub sp, sp, #32
+; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
 ; NONEON-NOSVE-NEXT:    ldr d0, [x0]
-; NONEON-NOSVE-NEXT:    str d0, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp]
+; NONEON-NOSVE-NEXT:    str d0, [sp, #8]
+; NONEON-NOSVE-NEXT:    ldp w8, w9, [sp, #8]
+; NONEON-NOSVE-NEXT:    stp w8, w9, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
-; NONEON-NOSVE-NEXT:    add sp, sp, #16
+; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret
   %op1 = load <2 x i32>, ptr %a
   %shuffle = shufflevector <2 x i32> %op1, <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
index b0fdce9a93bd3b..fbdfc4b7d96cf3 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
@@ -1138,17 +1138,15 @@ define <2 x i64> @smulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
 ;
 ; NONEON-NOSVE-LABEL: smulh_v2i64:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT:    sub sp, sp, #64
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
 ; NONEON-NOSVE-NEXT:    smulh x8, x8, x10
 ; NONEON-NOSVE-NEXT:    smulh x9, x9, x11
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT:    ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT:    add sp, sp, #64
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #64
 ; NONEON-NOSVE-NEXT:    ret
   %1 = sext <2 x i64> %op1 to <2 x i128>
   %2 = sext <2 x i64> %op2 to <2 x i128>
@@ -1185,23 +1183,19 @@ define void @smulh_v4i64(ptr %a, ptr %b) {
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 128
 ; NONEON-NOSVE-NEXT:    ldp q1, q0, [x0]
 ; NONEON-NOSVE-NEXT:    ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT:    stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #64]
 ; NONEON-NOSVE-NEXT:    smulh x10, x10, x12
-; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #80]
 ; NONEON-NOSVE-NEXT:    smulh x11, x11, x13
 ; NONEON-NOSVE-NEXT:    smulh x8, x8, x12
 ; NONEON-NOSVE-NEXT:    smulh x9, x9, x14
-; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT:    ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldp q1, q0, [sp]
 ; NONEON-NOSVE-NEXT:    stp q0, q1, [x0]
 ; NONEON-NOSVE-NEXT:    add sp, sp, #128
 ; NONEON-NOSVE-NEXT:    ret
@@ -2339,17 +2333,15 @@ define <2 x i64> @umulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) {
 ;
 ; NONEON-NOSVE-LABEL: umulh_v2i64:
 ; NONEON-NOSVE:       // %bb.0:
-; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #-64]!
+; NONEON-NOSVE-NEXT:    sub sp, sp, #64
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 64
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp q0, q1, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
 ; NONEON-NOSVE-NEXT:    umulh x8, x8, x10
 ; NONEON-NOSVE-NEXT:    umulh x9, x9, x11
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #32]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #48]
-; NONEON-NOSVE-NEXT:    ldr q0, [sp, #48]
-; NONEON-NOSVE-NEXT:    add sp, sp, #64
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp], #64
 ; NONEON-NOSVE-NEXT:    ret
   %1 = zext <2 x i64> %op1 to <2 x i128>
   %2 = zext <2 x i64> %op2 to <2 x i128>
@@ -2386,23 +2378,19 @@ define void @umulh_v4i64(ptr %a, ptr %b) {
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 128
 ; NONEON-NOSVE-NEXT:    ldp q1, q0, [x0]
 ; NONEON-NOSVE-NEXT:    ldp q2, q3, [x1]
-; NONEON-NOSVE-NEXT:    stp q1, q2, [sp]
-; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp]
-; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #32]
-; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #16]
-; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q1, q2, [sp, #32]
+; NONEON-NOSVE-NEXT:    ldp x11, x10, [sp, #32]
+; NONEON-NOSVE-NEXT:    stp q0, q3, [sp, #64]
+; NONEON-NOSVE-NEXT:    ldp x13, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x9, x8, [sp, #64]
 ; NONEON-NOSVE-NEXT:    umulh x10, x10, x12
-; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #48]
+; NONEON-NOSVE-NEXT:    ldp x14, x12, [sp, #80]
 ; NONEON-NOSVE-NEXT:    umulh x11, x11, x13
 ; NONEON-NOSVE-NEXT:    umulh x8, x8, x12
 ; NONEON-NOSVE-NEXT:    umulh x9, x9, x14
-; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #80]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #80]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #112]
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp, #64]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #96]
-; NONEON-NOSVE-NEXT:    ldp q0, q1, [sp, #96]
+; NONEON-NOSVE-NEXT:    stp x11, x10, [sp, #16]
+; NONEON-NOSVE-NEXT:    stp x9, x8, [sp]
+; NONEON-NOSVE-NEXT:    ldp q1, q0, [sp]
 ; NONEON-NOSVE-NEXT:    stp q0, q1, [x0]
 ; NONEON-NOSVE-NEXT:    add sp, sp, #128
 ; NONEON-NOSVE-NEXT:    ret
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
index 13fcd94ea8a260..ae87128b5c3f9d 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-trunc-stores.ll
@@ -142,9 +142,7 @@ define void @store_trunc_v2i256i64(ptr %ap, ptr %dest) {
 ; NONEON-NOSVE-NEXT:    ldr x9, [x0]
 ; NONEON-NOSVE-NEXT:    stp x9, x8, [sp, #-32]!
 ; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 32
-; NONEON-NOSVE-NEXT:    ldp d0, d1, [sp]
-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #16]
-; NONEON-NOSVE-NEXT:    ldr q0, [sp, #16]
+; NONEON-NOSVE-NEXT:    ldr q0, [sp]
 ; NONEON-NOSVE-NEXT:    str q0, [x1]
 ; NONEON-NOSVE-NEXT:    add sp, sp, #32
 ; NONEON-NOSVE-NEXT:    ret

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

arsenm · 2024-12-17T04:30:01Z

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll

-; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-16]!
-; NONEON-NOSVE-NEXT:    .cfi_def_cfa_offset 16
-; NONEON-NOSVE-NEXT:    ldr q0, [sp], #16
+; NONEON-NOSVE-NEXT:    stp d0, d1, [sp, #-32]!


Not sure if aarch64 would prefer to keep the old stack path.

This case does seem worse. It's traded 2 stack operations for 5

justinfargnoli

Besides a few formatting changes, I think this is almost ready to go!

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

github-actions · 2024-12-18T17:37:16Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm

Actual change LGTM, but we probably need some kind of legality heuristic for which legalization path to prefer. If we know the extract_vector_elt will expand to the stack, we should probably let this directly lower

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

arsenm

LGTM with nit. Also I'm surprised you're getting away without updating the target legalize rules

arsenm · 2025-01-22T00:52:30Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

@@ -2001,28 +2001,6 @@ SDValue NVPTXTargetLowering::LowerSTACKSAVE(SDValue Op,
  return DAG.getMergeValues({ASC, SDValue(SS.getNode(), 1)}, DL);
 }

-// By default CONCAT_VECTORS is lowered by ExpandVectorBuildThroughStack()


Presumably you also need to adjust the setOperationAction for cases that were setting Custom?

This might be completely wrong, but I think they were removed in 2013 with be8dc64

So this was previously dead code?

I think so.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

arsenm · 2025-01-22T00:53:47Z

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

@@ -3371,7 +3393,11 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
    Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0)));
    break;
  case ISD::CONCAT_VECTORS:
-    Results.push_back(ExpandVectorBuildThroughStack(Node));
+    if (!TLI.isOperationExpand(ISD::EXTRACT_VECTOR_ELT, Node->getOperand(0).getValueType())) {


Might need a guard against scalable vectors

RKSimon · 2025-03-04T16:40:10Z

@Esan5 please can you rebase/merge to trunk to check this patch hasn't rotted?

Esan5 · 2025-03-04T18:28:34Z

Given that the original code was dead code, are we still interested in merging this?

be8dc64 mentions this code was for a time before good target-independent support for scalarization was available.

justinfargnoli · 2025-03-04T19:19:02Z

Given that the original code was dead code, are we still interested in merging this?

We'd still be interested in deleting the dead NVPTX code.

However, since it doesn't look like there would be any users of SelectionDAGLegalize::ExpandConcatVectors(), I doubt we'd want to merge since it would only amount to moving the dead code around within the codebase.

Apologies for including concat_vector in #116695. I wasn't aware that it was dead code. I should've verified that it was used beforehand.

arsenm · 2025-03-05T03:53:30Z

Given that the original code was dead code, are we still interested in merging this?

We'd still be interested in deleting the dead NVPTX code.

However, since it doesn't look like there would be any users of SelectionDAGLegalize::ExpandConcatVectors(), I doubt we'd want to merge since it would only amount to moving the dead code around within the codebase.

It should still merge to the generic code. AMDGPU has essentially the same code in custom lowering (and I'm sure many out of tree targets do the same)

justinfargnoli · 2025-03-05T22:12:42Z

It should still merge to the generic code. AMDGPU has essentially the same code in custom lowering (and I'm sure many out of tree targets do the same)

Happy to hear that @Esan5's hard work won't go to waste!

@Esan5, if you'd prefer, feel free to split off the NVPTX aspect of this PR into a separate PR. I'll promptly approve and merge it.

Co-authored-by: Matt Arsenault <[email protected]>

… legal or custom.

This reverts commit d1bab4d.

github-actions · 2025-03-27T00:40:52Z

@Esan5 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvm-ci · 2025-03-27T01:30:18Z

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-hwasan running on sanitizer-buildbot11 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/9046

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 84276 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: lit :: discovery.py (81190 of 84276)
******************** TEST 'lit :: discovery.py' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 3
env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery    --debug --show-tests --show-suites    -v > /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out 2> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err
# executed command: env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery --debug --show-tests --show-suites -v
# note: command had no output on stdout or stderr
# RUN: at line 6
FileCheck --check-prefix=CHECK-BASIC-OUT < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# executed command: FileCheck --check-prefix=CHECK-BASIC-OUT /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# note: command had no output on stdout or stderr
# RUN: at line 7
FileCheck --check-prefix=CHECK-BASIC-ERR < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# executed command: FileCheck --check-prefix=CHECK-BASIC-ERR /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# note: command had no output on stdout or stderr
# RUN: at line 31
env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery    -v > /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out 2> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err
# executed command: env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery -v
# note: command had no output on stdout or stderr
# RUN: at line 33
FileCheck --check-prefix=CHECK-PERCENTAGES-OUT < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# executed command: FileCheck --check-prefix=CHECK-PERCENTAGES-OUT /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# note: command had no output on stdout or stderr
# RUN: at line 39
"/usr/bin/python3" Inputs/config-map-discovery/driver.py            Inputs/config-map-discovery/main-config/lit.cfg            Inputs/config-map-discovery/lit.alt.cfg            --workers=1 --debug --show-tests --show-suites > /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out 2> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err
# executed command: /usr/bin/python3 Inputs/config-map-discovery/driver.py Inputs/config-map-discovery/main-config/lit.cfg Inputs/config-map-discovery/lit.alt.cfg --workers=1 --debug --show-tests --show-suites
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
23.51s: LLVM :: tools/llvm-reduce/parallel-workitem-kill.ll
Step 14 (stage3/hwasan check) failure: stage3/hwasan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 84276 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: lit :: discovery.py (81190 of 84276)
******************** TEST 'lit :: discovery.py' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 3
env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery    --debug --show-tests --show-suites    -v > /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out 2> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err
# executed command: env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery --debug --show-tests --show-suites -v
# note: command had no output on stdout or stderr
# RUN: at line 6
FileCheck --check-prefix=CHECK-BASIC-OUT < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# executed command: FileCheck --check-prefix=CHECK-BASIC-OUT /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# note: command had no output on stdout or stderr
# RUN: at line 7
FileCheck --check-prefix=CHECK-BASIC-ERR < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# executed command: FileCheck --check-prefix=CHECK-BASIC-ERR /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# note: command had no output on stdout or stderr
# RUN: at line 31
env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery    -v > /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out 2> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err
# executed command: env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/discovery -v
# note: command had no output on stdout or stderr
# RUN: at line 33
FileCheck --check-prefix=CHECK-PERCENTAGES-OUT < /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# executed command: FileCheck --check-prefix=CHECK-PERCENTAGES-OUT /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/discovery.py
# note: command had no output on stdout or stderr
# RUN: at line 39
"/usr/bin/python3" Inputs/config-map-discovery/driver.py            Inputs/config-map-discovery/main-config/lit.cfg            Inputs/config-map-discovery/lit.alt.cfg            --workers=1 --debug --show-tests --show-suites > /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.out 2> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build2_hwasan/utils/lit/tests/Output/discovery.py.tmp.err
# executed command: /usr/bin/python3 Inputs/config-map-discovery/driver.py Inputs/config-map-discovery/main-config/lit.cfg Inputs/config-map-discovery/lit.alt.cfg --workers=1 --debug --show-tests --show-suites
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
23.51s: LLVM :: tools/llvm-reduce/parallel-workitem-kill.ll

Esan5 force-pushed the dag/port-concat-vectors branch from f4ccd5d to 621fa58 Compare December 16, 2024 02:02

Esan5 force-pushed the dag/port-concat-vectors branch from 621fa58 to 67ecdf0 Compare December 17, 2024 01:27

Esan5 marked this pull request as ready for review December 17, 2024 01:30

llvmbot added backend:AArch64 backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Dec 17, 2024

arsenm reviewed Dec 17, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

justinfargnoli changed the title ~~Port NVPTXTargetLowering::LowerCONCAT_VECTORS to llvm/lib/CodeGen/SelectionDAG~~ Port NVPTXTargetLowering::LowerCONCAT_VECTORS to SelectionDAG Dec 17, 2024

arsenm reviewed Dec 17, 2024

View reviewed changes

justinfargnoli self-requested a review December 17, 2024 22:28

justinfargnoli reviewed Dec 17, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

justinfargnoli requested a review from TNorthover December 17, 2024 22:30

Esan5 commented Dec 18, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

arsenm reviewed Dec 20, 2024

View reviewed changes

Esan5 commented Dec 20, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

justinfargnoli reviewed Dec 21, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

arsenm reviewed Dec 23, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved

arsenm approved these changes Jan 22, 2025

View reviewed changes

Esan5 force-pushed the dag/port-concat-vectors branch from 4243d5f to 1c9fa8e Compare March 4, 2025 18:00

Esan5 mentioned this pull request Mar 26, 2025

Remove NVPTXTargetLowering::LowerCONCAT_VECTORS #133145

Open

Esan5 and others added 16 commits March 26, 2025 15:41

first try

fbba746

fix tests

34be657

comments

22ecb84

prefer size 16 in small vector

2d1bfd7

Co-authored-by: Matt Arsenault <[email protected]>

trigger builds

4e72483

comments

ade2707

extract to legal type

5682483

use build through stack if type changes

1ac6bd5

just checking if the type changed is not sufficient

d184705

use ExpandVectorBuildThroughStack if extract vector elt is not is not…

cef586a

… legal or custom.

try ExpandConcatVectors if extract_vector_elt is not an Expand operation

cb1bfea

remove braces negate and swap order

3c37765

scalable vector guard with vector value type

f7c9f31

remove dead code

cb51c4e

Revert "remove dead code"

1effb91

This reverts commit d1bab4d.

handled in a seperate PR

c318b91

Esan5 force-pushed the dag/port-concat-vectors branch from 4c4083f to c318b91 Compare March 26, 2025 19:42

arsenm approved these changes Mar 27, 2025

View reviewed changes

arsenm merged commit a629b50 into llvm:main Mar 27, 2025
11 checks passed

Port NVPTXTargetLowering::LowerCONCAT_VECTORS to SelectionDAG #120030

Port NVPTXTargetLowering::LowerCONCAT_VECTORS to SelectionDAG #120030

Uh oh!

Conversation

Esan5 commented Dec 16, 2024

Uh oh!

github-actions bot commented Dec 16, 2024

Uh oh!

Esan5 commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinfargnoli commented Dec 16, 2024

Uh oh!

Esan5 commented Dec 16, 2024

Uh oh!

justinfargnoli commented Dec 17, 2024

Uh oh!

llvmbot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 17, 2024

Uh oh!

Uh oh!

Uh oh!

arsenm Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

justinfargnoli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

Esan5 Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

Esan5 Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon commented Mar 4, 2025

Uh oh!

Esan5 commented Mar 4, 2025

Uh oh!

justinfargnoli commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Mar 5, 2025

Uh oh!

justinfargnoli commented Mar 5, 2025

Uh oh!

Uh oh!

github-actions bot commented Mar 27, 2025

Uh oh!

llvm-ci commented Mar 27, 2025

Port `NVPTXTargetLowering::LowerCONCAT_VECTORS` to SelectionDAG #120030

Port `NVPTXTargetLowering::LowerCONCAT_VECTORS` to SelectionDAG #120030

Esan5 commented Dec 16, 2024 •

edited

Loading

llvmbot commented Dec 17, 2024 •

edited

Loading

github-actions bot commented Dec 18, 2024 •

edited

Loading

arsenm left a comment •

edited

Loading

justinfargnoli commented Mar 4, 2025 •

edited

Loading