Skip to content

[SelectionDAG][RISCV] Preserve nneg flag when folding (trunc (zext X))->(zext X). #144807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 19, 2025

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Jun 18, 2025

If X is known non-negative, that's still true if we fold the truncate
to create a smaller zext.

In the i128 tests, SelectionDAGBuilder aggressively truncates the
zext nneg to i64 to match getShiftAmountTy. If we don't preserve
the nneg we can't see that the shift amount argument being signext
means we don't need to do any extension

topperc added 2 commits June 18, 2025 15:06
…)->(zext X).

If X is known non-negative, that's still true if we fold the truncate
to create a smaller zext.

In the 128 tests, SelectionDAGBuilder agressively truncates the
zext nneg to i64 to match getShiftAmountTy. If we don't preserve
the nneg we can't see that the shift amount argument be signext
mean we don't need to do any extension.
@topperc topperc requested review from nikic, preames and RKSimon June 18, 2025 22:18
@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Jun 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Craig Topper (topperc)

Changes

If X is known non-negative, that's still true if we fold the truncate
to create a smaller zext.

In the i128 tests, SelectionDAGBuilder aggressively truncates the
zext nneg to i64 to match getShiftAmountTy. If we don't preserve
the nneg we can't see that the shift amount argument being signext
means we don't need to do any extension


Full diff: https://github.com/llvm/llvm-project/pull/144807.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+6-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+6-2)
  • (modified) llvm/test/CodeGen/RISCV/shifts.ll (+295)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0e078f9dd88b4..a6b9cc81edde6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15740,8 +15740,12 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
       N0.getOpcode() == ISD::SIGN_EXTEND ||
       N0.getOpcode() == ISD::ANY_EXTEND) {
     // if the source is smaller than the dest, we still need an extend.
-    if (N0.getOperand(0).getValueType().bitsLT(VT))
-      return DAG.getNode(N0.getOpcode(), DL, VT, N0.getOperand(0));
+    if (N0.getOperand(0).getValueType().bitsLT(VT)) {
+      SDNodeFlags Flags;
+      if (N0.getOpcode() == ISD::ZERO_EXTEND)
+        Flags.setNonNeg(N0->getFlags().hasNonNeg());
+      return DAG.getNode(N0.getOpcode(), DL, VT, N0.getOperand(0), Flags);
+    }
     // if the source is larger than the dest, than we just need the truncate.
     if (N0.getOperand(0).getValueType().bitsGT(VT))
       return DAG.getNode(ISD::TRUNCATE, DL, VT, N0.getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index b0e3f534e2aaa..5d8db8be9731f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -6474,8 +6474,12 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
         OpOpcode == ISD::ANY_EXTEND) {
       // If the source is smaller than the dest, we still need an extend.
       if (N1.getOperand(0).getValueType().getScalarType().bitsLT(
-              VT.getScalarType()))
-        return getNode(OpOpcode, DL, VT, N1.getOperand(0));
+              VT.getScalarType())) {
+        SDNodeFlags Flags;
+        if (OpOpcode == ISD::ZERO_EXTEND)
+          Flags.setNonNeg(N1->getFlags().hasNonNeg());
+        return getNode(OpOpcode, DL, VT, N1.getOperand(0), Flags);
+      }
       if (N1.getOperand(0).getValueType().bitsGT(VT))
         return getNode(ISD::TRUNCATE, DL, VT, N1.getOperand(0));
       return N1.getOperand(0);
diff --git a/llvm/test/CodeGen/RISCV/shifts.ll b/llvm/test/CodeGen/RISCV/shifts.ll
index 249dabba0cc28..32a037918a5a7 100644
--- a/llvm/test/CodeGen/RISCV/shifts.ll
+++ b/llvm/test/CodeGen/RISCV/shifts.ll
@@ -484,3 +484,298 @@ define i128 @fshr128_minsize(i128 %a, i128 %b) minsize nounwind {
   %res = tail call i128 @llvm.fshr.i128(i128 %a, i128 %a, i128 %b)
   ret i128 %res
 }
+
+define i64 @lshr64_shamt32(i64 %a, i32 signext %b) nounwind {
+; RV32I-LABEL: lshr64_shamt32:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi a4, a2, -32
+; RV32I-NEXT:    srl a3, a1, a2
+; RV32I-NEXT:    bltz a4, .LBB11_2
+; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    mv a0, a3
+; RV32I-NEXT:    j .LBB11_3
+; RV32I-NEXT:  .LBB11_2:
+; RV32I-NEXT:    srl a0, a0, a2
+; RV32I-NEXT:    not a2, a2
+; RV32I-NEXT:    slli a1, a1, 1
+; RV32I-NEXT:    sll a1, a1, a2
+; RV32I-NEXT:    or a0, a0, a1
+; RV32I-NEXT:  .LBB11_3:
+; RV32I-NEXT:    srai a1, a4, 31
+; RV32I-NEXT:    and a1, a1, a3
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: lshr64_shamt32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    srl a0, a0, a1
+; RV64I-NEXT:    ret
+  %zext = zext nneg i32 %b to i64
+  %1 = lshr i64 %a, %zext
+  ret i64 %1
+}
+
+define i64 @ashr64_shamt32(i64 %a, i32 signext %b) nounwind {
+; RV32I-LABEL: ashr64_shamt32:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    mv a3, a1
+; RV32I-NEXT:    addi a4, a2, -32
+; RV32I-NEXT:    sra a1, a1, a2
+; RV32I-NEXT:    bltz a4, .LBB12_2
+; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    srai a3, a3, 31
+; RV32I-NEXT:    mv a0, a1
+; RV32I-NEXT:    mv a1, a3
+; RV32I-NEXT:    ret
+; RV32I-NEXT:  .LBB12_2:
+; RV32I-NEXT:    srl a0, a0, a2
+; RV32I-NEXT:    not a2, a2
+; RV32I-NEXT:    slli a3, a3, 1
+; RV32I-NEXT:    sll a2, a3, a2
+; RV32I-NEXT:    or a0, a0, a2
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: ashr64_shamt32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    sra a0, a0, a1
+; RV64I-NEXT:    ret
+  %zext = zext nneg i32 %b to i64
+  %1 = ashr i64 %a, %zext
+  ret i64 %1
+}
+
+define i64 @shl64_shamt32(i64 %a, i32 signext %b) nounwind {
+; RV32I-LABEL: shl64_shamt32:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi a4, a2, -32
+; RV32I-NEXT:    sll a3, a0, a2
+; RV32I-NEXT:    bltz a4, .LBB13_2
+; RV32I-NEXT:  # %bb.1:
+; RV32I-NEXT:    mv a1, a3
+; RV32I-NEXT:    j .LBB13_3
+; RV32I-NEXT:  .LBB13_2:
+; RV32I-NEXT:    sll a1, a1, a2
+; RV32I-NEXT:    not a2, a2
+; RV32I-NEXT:    srli a0, a0, 1
+; RV32I-NEXT:    srl a0, a0, a2
+; RV32I-NEXT:    or a1, a1, a0
+; RV32I-NEXT:  .LBB13_3:
+; RV32I-NEXT:    srai a0, a4, 31
+; RV32I-NEXT:    and a0, a0, a3
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: shl64_shamt32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    sll a0, a0, a1
+; RV64I-NEXT:    ret
+  %zext = zext nneg i32 %b to i64
+  %1 = shl i64 %a, %zext
+  ret i64 %1
+}
+
+define i128 @lshr128_shamt32(i128 %a, i32 signext %b) nounwind {
+; RV32I-LABEL: lshr128_shamt32:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi sp, sp, -32
+; RV32I-NEXT:    lw a3, 0(a1)
+; RV32I-NEXT:    lw a4, 4(a1)
+; RV32I-NEXT:    lw a5, 8(a1)
+; RV32I-NEXT:    lw a1, 12(a1)
+; RV32I-NEXT:    sw zero, 16(sp)
+; RV32I-NEXT:    sw zero, 20(sp)
+; RV32I-NEXT:    sw zero, 24(sp)
+; RV32I-NEXT:    sw zero, 28(sp)
+; RV32I-NEXT:    srli a6, a2, 3
+; RV32I-NEXT:    mv a7, sp
+; RV32I-NEXT:    andi t0, a2, 31
+; RV32I-NEXT:    andi a6, a6, 12
+; RV32I-NEXT:    xori t0, t0, 31
+; RV32I-NEXT:    add a6, a7, a6
+; RV32I-NEXT:    sw a3, 0(sp)
+; RV32I-NEXT:    sw a4, 4(sp)
+; RV32I-NEXT:    sw a5, 8(sp)
+; RV32I-NEXT:    sw a1, 12(sp)
+; RV32I-NEXT:    lw a1, 0(a6)
+; RV32I-NEXT:    lw a3, 4(a6)
+; RV32I-NEXT:    lw a4, 8(a6)
+; RV32I-NEXT:    lw a5, 12(a6)
+; RV32I-NEXT:    srl a1, a1, a2
+; RV32I-NEXT:    slli a6, a3, 1
+; RV32I-NEXT:    srl a3, a3, a2
+; RV32I-NEXT:    slli a7, a4, 1
+; RV32I-NEXT:    srl a4, a4, a2
+; RV32I-NEXT:    srl a2, a5, a2
+; RV32I-NEXT:    slli a5, a5, 1
+; RV32I-NEXT:    sll a6, a6, t0
+; RV32I-NEXT:    sll a7, a7, t0
+; RV32I-NEXT:    sll a5, a5, t0
+; RV32I-NEXT:    or a1, a1, a6
+; RV32I-NEXT:    or a3, a3, a7
+; RV32I-NEXT:    or a4, a4, a5
+; RV32I-NEXT:    sw a1, 0(a0)
+; RV32I-NEXT:    sw a3, 4(a0)
+; RV32I-NEXT:    sw a4, 8(a0)
+; RV32I-NEXT:    sw a2, 12(a0)
+; RV32I-NEXT:    addi sp, sp, 32
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: lshr128_shamt32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    addi a4, a2, -64
+; RV64I-NEXT:    srl a3, a1, a2
+; RV64I-NEXT:    bltz a4, .LBB14_2
+; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    mv a0, a3
+; RV64I-NEXT:    j .LBB14_3
+; RV64I-NEXT:  .LBB14_2:
+; RV64I-NEXT:    srl a0, a0, a2
+; RV64I-NEXT:    not a2, a2
+; RV64I-NEXT:    slli a1, a1, 1
+; RV64I-NEXT:    sll a1, a1, a2
+; RV64I-NEXT:    or a0, a0, a1
+; RV64I-NEXT:  .LBB14_3:
+; RV64I-NEXT:    srai a1, a4, 63
+; RV64I-NEXT:    and a1, a1, a3
+; RV64I-NEXT:    ret
+  %zext = zext nneg i32 %b to i128
+  %1 = lshr i128 %a, %zext
+  ret i128 %1
+}
+
+define i128 @ashr128_shamt32(i128 %a, i32 signext %b) nounwind {
+; RV32I-LABEL: ashr128_shamt32:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi sp, sp, -32
+; RV32I-NEXT:    lw a3, 0(a1)
+; RV32I-NEXT:    lw a4, 4(a1)
+; RV32I-NEXT:    lw a5, 8(a1)
+; RV32I-NEXT:    lw a1, 12(a1)
+; RV32I-NEXT:    srli a6, a2, 3
+; RV32I-NEXT:    mv a7, sp
+; RV32I-NEXT:    andi t0, a2, 31
+; RV32I-NEXT:    andi a6, a6, 12
+; RV32I-NEXT:    xori t0, t0, 31
+; RV32I-NEXT:    add a6, a7, a6
+; RV32I-NEXT:    sw a3, 0(sp)
+; RV32I-NEXT:    sw a4, 4(sp)
+; RV32I-NEXT:    sw a5, 8(sp)
+; RV32I-NEXT:    sw a1, 12(sp)
+; RV32I-NEXT:    srai a1, a1, 31
+; RV32I-NEXT:    sw a1, 16(sp)
+; RV32I-NEXT:    sw a1, 20(sp)
+; RV32I-NEXT:    sw a1, 24(sp)
+; RV32I-NEXT:    sw a1, 28(sp)
+; RV32I-NEXT:    lw a1, 0(a6)
+; RV32I-NEXT:    lw a3, 4(a6)
+; RV32I-NEXT:    lw a4, 8(a6)
+; RV32I-NEXT:    lw a5, 12(a6)
+; RV32I-NEXT:    srl a1, a1, a2
+; RV32I-NEXT:    slli a6, a3, 1
+; RV32I-NEXT:    srl a3, a3, a2
+; RV32I-NEXT:    slli a7, a4, 1
+; RV32I-NEXT:    srl a4, a4, a2
+; RV32I-NEXT:    sra a2, a5, a2
+; RV32I-NEXT:    slli a5, a5, 1
+; RV32I-NEXT:    sll a6, a6, t0
+; RV32I-NEXT:    sll a7, a7, t0
+; RV32I-NEXT:    sll a5, a5, t0
+; RV32I-NEXT:    or a1, a1, a6
+; RV32I-NEXT:    or a3, a3, a7
+; RV32I-NEXT:    or a4, a4, a5
+; RV32I-NEXT:    sw a1, 0(a0)
+; RV32I-NEXT:    sw a3, 4(a0)
+; RV32I-NEXT:    sw a4, 8(a0)
+; RV32I-NEXT:    sw a2, 12(a0)
+; RV32I-NEXT:    addi sp, sp, 32
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: ashr128_shamt32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    mv a3, a1
+; RV64I-NEXT:    addi a4, a2, -64
+; RV64I-NEXT:    sra a1, a1, a2
+; RV64I-NEXT:    bltz a4, .LBB15_2
+; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    srai a3, a3, 63
+; RV64I-NEXT:    mv a0, a1
+; RV64I-NEXT:    mv a1, a3
+; RV64I-NEXT:    ret
+; RV64I-NEXT:  .LBB15_2:
+; RV64I-NEXT:    srl a0, a0, a2
+; RV64I-NEXT:    not a2, a2
+; RV64I-NEXT:    slli a3, a3, 1
+; RV64I-NEXT:    sll a2, a3, a2
+; RV64I-NEXT:    or a0, a0, a2
+; RV64I-NEXT:    ret
+  %zext = zext nneg i32 %b to i128
+  %1 = ashr i128 %a, %zext
+  ret i128 %1
+}
+
+define i128 @shl128_shamt32(i128 %a, i32 signext %b) nounwind {
+; RV32I-LABEL: shl128_shamt32:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi sp, sp, -32
+; RV32I-NEXT:    lw a3, 0(a1)
+; RV32I-NEXT:    lw a4, 4(a1)
+; RV32I-NEXT:    lw a5, 8(a1)
+; RV32I-NEXT:    lw a1, 12(a1)
+; RV32I-NEXT:    sw zero, 0(sp)
+; RV32I-NEXT:    sw zero, 4(sp)
+; RV32I-NEXT:    sw zero, 8(sp)
+; RV32I-NEXT:    sw zero, 12(sp)
+; RV32I-NEXT:    srli a6, a2, 3
+; RV32I-NEXT:    addi a7, sp, 16
+; RV32I-NEXT:    andi t0, a2, 31
+; RV32I-NEXT:    andi a6, a6, 12
+; RV32I-NEXT:    sub a6, a7, a6
+; RV32I-NEXT:    sw a3, 16(sp)
+; RV32I-NEXT:    sw a4, 20(sp)
+; RV32I-NEXT:    sw a5, 24(sp)
+; RV32I-NEXT:    sw a1, 28(sp)
+; RV32I-NEXT:    lw a1, 0(a6)
+; RV32I-NEXT:    lw a3, 4(a6)
+; RV32I-NEXT:    lw a4, 8(a6)
+; RV32I-NEXT:    lw a5, 12(a6)
+; RV32I-NEXT:    xori a6, t0, 31
+; RV32I-NEXT:    sll a7, a3, a2
+; RV32I-NEXT:    srli t0, a1, 1
+; RV32I-NEXT:    sll a5, a5, a2
+; RV32I-NEXT:    sll a1, a1, a2
+; RV32I-NEXT:    sll a2, a4, a2
+; RV32I-NEXT:    srli a3, a3, 1
+; RV32I-NEXT:    srli a4, a4, 1
+; RV32I-NEXT:    srl t0, t0, a6
+; RV32I-NEXT:    srl a3, a3, a6
+; RV32I-NEXT:    srl a4, a4, a6
+; RV32I-NEXT:    or a6, a7, t0
+; RV32I-NEXT:    or a2, a2, a3
+; RV32I-NEXT:    or a4, a5, a4
+; RV32I-NEXT:    sw a1, 0(a0)
+; RV32I-NEXT:    sw a6, 4(a0)
+; RV32I-NEXT:    sw a2, 8(a0)
+; RV32I-NEXT:    sw a4, 12(a0)
+; RV32I-NEXT:    addi sp, sp, 32
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: shl128_shamt32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    addi a4, a2, -64
+; RV64I-NEXT:    sll a3, a0, a2
+; RV64I-NEXT:    bltz a4, .LBB16_2
+; RV64I-NEXT:  # %bb.1:
+; RV64I-NEXT:    mv a1, a3
+; RV64I-NEXT:    j .LBB16_3
+; RV64I-NEXT:  .LBB16_2:
+; RV64I-NEXT:    sll a1, a1, a2
+; RV64I-NEXT:    not a2, a2
+; RV64I-NEXT:    srli a0, a0, 1
+; RV64I-NEXT:    srl a0, a0, a2
+; RV64I-NEXT:    or a1, a1, a0
+; RV64I-NEXT:  .LBB16_3:
+; RV64I-NEXT:    srai a0, a4, 63
+; RV64I-NEXT:    and a0, a0, a3
+; RV64I-NEXT:    ret
+  %zext = zext nneg i32 %b to i128
+  %1 = shl i128 %a, %zext
+  ret i128 %1
+}

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@topperc topperc merged commit 5eb24fd into llvm:main Jun 19, 2025
9 checks passed
@topperc topperc deleted the pr/zext-nneg branch June 19, 2025 15:06
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 19, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/14825

Here is the relevant piece of the build log for the reference
Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x59153dd66fa0 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x59153dd66fa0 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants