[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. #119481

topperc · 2024-12-11T01:33:06Z

Previously we created an FP_TO_FP16 and legalized it in SoftenFloatOp_FP_ROUND. This caused i16 to be sent to call lowering instead of f16. This results in the ABI not being followed if f16 is supposed to be passed in a different register than i16.

Looking at the libgcc code for the library function it appears the value is returned in xmm0 so the X86 test was being miscompiled before.

Fixes #107607.

…teHalfRes_FP_ROUND if the input type is softened. Previously we created an FP_TO_FP16 and legalized it in SoftenFloatOp_FP_ROUND. This caused i16 to be sent to call lowering instead of f16. This results in the ABI not being followed if f16 is supposed to be passed in a different register than i16. Looking at the libgcc code for the library function it appears the value is returned in xmm0 so the X86 test was being miscompiled before. Fixes llvm#107607.

llvmbot · 2024-12-11T01:33:42Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Craig Topper (topperc)

Changes

Previously we created an FP_TO_FP16 and legalized it in SoftenFloatOp_FP_ROUND. This caused i16 to be sent to call lowering instead of f16. This results in the ABI not being followed if f16 is supposed to be passed in a different register than i16.

Looking at the libgcc code for the library function it appears the value is returned in xmm0 so the X86 test was being miscompiled before.

Fixes #107607.

Patch is 34.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119481.diff

3 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp (+17)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+808)
(modified) llvm/test/CodeGen/X86/bfloat.ll (-2)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index b52c2c07a7fba0..71f100bfa03434 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -3415,6 +3415,23 @@ SDValue DAGTypeLegalizer::SoftPromoteHalfRes_FP_ROUND(SDNode *N) {
   SDValue Op = N->getOperand(IsStrict ? 1 : 0);
   EVT SVT = Op.getValueType();
 
+  // If the input type needs to be softened, do that now so that call lowering
+  // will see the f16 type.
+  if (getTypeAction(SVT) == TargetLowering::TypeSoftenFloat) {
+    RTLIB::Libcall LC = RTLIB::getFPROUND(SVT, RVT);
+    assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported FP_ROUND libcall");
+
+    SDValue Chain = IsStrict ? N->getOperand(0) : SDValue();
+    Op = GetSoftenedFloat(Op);
+    TargetLowering::MakeLibCallOptions CallOptions;
+    CallOptions.setTypeListBeforeSoften(SVT, RVT, true);
+    std::pair<SDValue, SDValue> Tmp =
+        TLI.makeLibCall(DAG, LC, RVT, Op, CallOptions, SDLoc(N), Chain);
+    if (IsStrict)
+      ReplaceValueWith(SDValue(N, 1), Tmp.second);
+    return DAG.getNode(ISD::BITCAST, SDLoc(N), MVT::i16, Tmp.first);
+  }
+
   if (IsStrict) {
     SDValue Res = DAG.getNode(GetPromotionOpcodeStrict(SVT, RVT), SDLoc(N),
                               {MVT::i16, MVT::Other}, {N->getOperand(0), Op});
diff --git a/llvm/test/CodeGen/RISCV/half-convert.ll b/llvm/test/CodeGen/RISCV/half-convert.ll
index 01ffcab1a6556f..cf57ecd6cd1e43 100644
--- a/llvm/test/CodeGen/RISCV/half-convert.ll
+++ b/llvm/test/CodeGen/RISCV/half-convert.ll
@@ -8587,3 +8587,811 @@ start:
   %0 = tail call i32 @llvm.fptosi.sat.i32.f16(half %a)
   ret i32 %0
 }
+
+; Test lib call lowering for fp128->half to make sure it matches the ABI.
+define void @fcvt_h_q(fp128 %x, ptr %y) nounwind {
+; RV32IZFH-LABEL: fcvt_h_q:
+; RV32IZFH:       # %bb.0:
+; RV32IZFH-NEXT:    addi sp, sp, -32
+; RV32IZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    lw a2, 0(a0)
+; RV32IZFH-NEXT:    lw a3, 4(a0)
+; RV32IZFH-NEXT:    lw a4, 8(a0)
+; RV32IZFH-NEXT:    lw a5, 12(a0)
+; RV32IZFH-NEXT:    mv s0, a1
+; RV32IZFH-NEXT:    addi a0, sp, 8
+; RV32IZFH-NEXT:    sw a2, 8(sp)
+; RV32IZFH-NEXT:    sw a3, 12(sp)
+; RV32IZFH-NEXT:    sw a4, 16(sp)
+; RV32IZFH-NEXT:    sw a5, 20(sp)
+; RV32IZFH-NEXT:    call __trunctfhf2
+; RV32IZFH-NEXT:    fsh fa0, 0(s0)
+; RV32IZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    addi sp, sp, 32
+; RV32IZFH-NEXT:    ret
+;
+; RV64IZFH-LABEL: fcvt_h_q:
+; RV64IZFH:       # %bb.0:
+; RV64IZFH-NEXT:    addi sp, sp, -16
+; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZFH-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IZFH-NEXT:    mv s0, a2
+; RV64IZFH-NEXT:    call __trunctfhf2
+; RV64IZFH-NEXT:    fsh fa0, 0(s0)
+; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZFH-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    ret
+;
+; RV32IDZFH-LABEL: fcvt_h_q:
+; RV32IDZFH:       # %bb.0:
+; RV32IDZFH-NEXT:    addi sp, sp, -32
+; RV32IDZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    lw a2, 0(a0)
+; RV32IDZFH-NEXT:    lw a3, 4(a0)
+; RV32IDZFH-NEXT:    lw a4, 8(a0)
+; RV32IDZFH-NEXT:    lw a5, 12(a0)
+; RV32IDZFH-NEXT:    mv s0, a1
+; RV32IDZFH-NEXT:    addi a0, sp, 8
+; RV32IDZFH-NEXT:    sw a2, 8(sp)
+; RV32IDZFH-NEXT:    sw a3, 12(sp)
+; RV32IDZFH-NEXT:    sw a4, 16(sp)
+; RV32IDZFH-NEXT:    sw a5, 20(sp)
+; RV32IDZFH-NEXT:    call __trunctfhf2
+; RV32IDZFH-NEXT:    fsh fa0, 0(s0)
+; RV32IDZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    addi sp, sp, 32
+; RV32IDZFH-NEXT:    ret
+;
+; RV64IDZFH-LABEL: fcvt_h_q:
+; RV64IDZFH:       # %bb.0:
+; RV64IDZFH-NEXT:    addi sp, sp, -16
+; RV64IDZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IDZFH-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IDZFH-NEXT:    mv s0, a2
+; RV64IDZFH-NEXT:    call __trunctfhf2
+; RV64IDZFH-NEXT:    fsh fa0, 0(s0)
+; RV64IDZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IDZFH-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IDZFH-NEXT:    addi sp, sp, 16
+; RV64IDZFH-NEXT:    ret
+;
+; RV32IZHINX-LABEL: fcvt_h_q:
+; RV32IZHINX:       # %bb.0:
+; RV32IZHINX-NEXT:    addi sp, sp, -32
+; RV32IZHINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    lw a2, 0(a0)
+; RV32IZHINX-NEXT:    lw a3, 4(a0)
+; RV32IZHINX-NEXT:    lw a4, 8(a0)
+; RV32IZHINX-NEXT:    lw a5, 12(a0)
+; RV32IZHINX-NEXT:    mv s0, a1
+; RV32IZHINX-NEXT:    addi a0, sp, 8
+; RV32IZHINX-NEXT:    sw a2, 8(sp)
+; RV32IZHINX-NEXT:    sw a3, 12(sp)
+; RV32IZHINX-NEXT:    sw a4, 16(sp)
+; RV32IZHINX-NEXT:    sw a5, 20(sp)
+; RV32IZHINX-NEXT:    call __trunctfhf2
+; RV32IZHINX-NEXT:    sh a0, 0(s0)
+; RV32IZHINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZHINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZHINX-NEXT:    addi sp, sp, 32
+; RV32IZHINX-NEXT:    ret
+;
+; RV64IZHINX-LABEL: fcvt_h_q:
+; RV64IZHINX:       # %bb.0:
+; RV64IZHINX-NEXT:    addi sp, sp, -16
+; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZHINX-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IZHINX-NEXT:    mv s0, a2
+; RV64IZHINX-NEXT:    call __trunctfhf2
+; RV64IZHINX-NEXT:    sh a0, 0(s0)
+; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZHINX-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    ret
+;
+; RV32IZDINXZHINX-LABEL: fcvt_h_q:
+; RV32IZDINXZHINX:       # %bb.0:
+; RV32IZDINXZHINX-NEXT:    addi sp, sp, -32
+; RV32IZDINXZHINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZDINXZHINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZDINXZHINX-NEXT:    lw a2, 0(a0)
+; RV32IZDINXZHINX-NEXT:    lw a3, 4(a0)
+; RV32IZDINXZHINX-NEXT:    lw a4, 8(a0)
+; RV32IZDINXZHINX-NEXT:    lw a5, 12(a0)
+; RV32IZDINXZHINX-NEXT:    mv s0, a1
+; RV32IZDINXZHINX-NEXT:    addi a0, sp, 8
+; RV32IZDINXZHINX-NEXT:    sw a2, 8(sp)
+; RV32IZDINXZHINX-NEXT:    sw a3, 12(sp)
+; RV32IZDINXZHINX-NEXT:    sw a4, 16(sp)
+; RV32IZDINXZHINX-NEXT:    sw a5, 20(sp)
+; RV32IZDINXZHINX-NEXT:    call __trunctfhf2
+; RV32IZDINXZHINX-NEXT:    sh a0, 0(s0)
+; RV32IZDINXZHINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZDINXZHINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZDINXZHINX-NEXT:    addi sp, sp, 32
+; RV32IZDINXZHINX-NEXT:    ret
+;
+; RV64IZDINXZHINX-LABEL: fcvt_h_q:
+; RV64IZDINXZHINX:       # %bb.0:
+; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
+; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZDINXZHINX-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IZDINXZHINX-NEXT:    mv s0, a2
+; RV64IZDINXZHINX-NEXT:    call __trunctfhf2
+; RV64IZDINXZHINX-NEXT:    sh a0, 0(s0)
+; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZDINXZHINX-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    ret
+;
+; RV32I-LABEL: fcvt_h_q:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi sp, sp, -32
+; RV32I-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    lw a2, 0(a0)
+; RV32I-NEXT:    lw a3, 4(a0)
+; RV32I-NEXT:    lw a4, 8(a0)
+; RV32I-NEXT:    lw a5, 12(a0)
+; RV32I-NEXT:    mv s0, a1
+; RV32I-NEXT:    addi a0, sp, 8
+; RV32I-NEXT:    sw a2, 8(sp)
+; RV32I-NEXT:    sw a3, 12(sp)
+; RV32I-NEXT:    sw a4, 16(sp)
+; RV32I-NEXT:    sw a5, 20(sp)
+; RV32I-NEXT:    call __trunctfhf2
+; RV32I-NEXT:    sh a0, 0(s0)
+; RV32I-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    addi sp, sp, 32
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: fcvt_h_q:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    addi sp, sp, -16
+; RV64I-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    mv s0, a2
+; RV64I-NEXT:    call __trunctfhf2
+; RV64I-NEXT:    sh a0, 0(s0)
+; RV64I-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    addi sp, sp, 16
+; RV64I-NEXT:    ret
+;
+; RV32ID-ILP32-LABEL: fcvt_h_q:
+; RV32ID-ILP32:       # %bb.0:
+; RV32ID-ILP32-NEXT:    addi sp, sp, -32
+; RV32ID-ILP32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32ID-ILP32-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32ID-ILP32-NEXT:    lw a2, 0(a0)
+; RV32ID-ILP32-NEXT:    lw a3, 4(a0)
+; RV32ID-ILP32-NEXT:    lw a4, 8(a0)
+; RV32ID-ILP32-NEXT:    lw a5, 12(a0)
+; RV32ID-ILP32-NEXT:    mv s0, a1
+; RV32ID-ILP32-NEXT:    addi a0, sp, 8
+; RV32ID-ILP32-NEXT:    sw a2, 8(sp)
+; RV32ID-ILP32-NEXT:    sw a3, 12(sp)
+; RV32ID-ILP32-NEXT:    sw a4, 16(sp)
+; RV32ID-ILP32-NEXT:    sw a5, 20(sp)
+; RV32ID-ILP32-NEXT:    call __trunctfhf2
+; RV32ID-ILP32-NEXT:    sh a0, 0(s0)
+; RV32ID-ILP32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32ID-ILP32-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32ID-ILP32-NEXT:    addi sp, sp, 32
+; RV32ID-ILP32-NEXT:    ret
+;
+; RV64ID-LP64-LABEL: fcvt_h_q:
+; RV64ID-LP64:       # %bb.0:
+; RV64ID-LP64-NEXT:    addi sp, sp, -16
+; RV64ID-LP64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64ID-LP64-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64ID-LP64-NEXT:    mv s0, a2
+; RV64ID-LP64-NEXT:    call __trunctfhf2
+; RV64ID-LP64-NEXT:    sh a0, 0(s0)
+; RV64ID-LP64-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64ID-LP64-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64ID-LP64-NEXT:    addi sp, sp, 16
+; RV64ID-LP64-NEXT:    ret
+;
+; RV32ID-LABEL: fcvt_h_q:
+; RV32ID:       # %bb.0:
+; RV32ID-NEXT:    addi sp, sp, -32
+; RV32ID-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32ID-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32ID-NEXT:    lw a2, 0(a0)
+; RV32ID-NEXT:    lw a3, 4(a0)
+; RV32ID-NEXT:    lw a4, 8(a0)
+; RV32ID-NEXT:    lw a5, 12(a0)
+; RV32ID-NEXT:    mv s0, a1
+; RV32ID-NEXT:    addi a0, sp, 8
+; RV32ID-NEXT:    sw a2, 8(sp)
+; RV32ID-NEXT:    sw a3, 12(sp)
+; RV32ID-NEXT:    sw a4, 16(sp)
+; RV32ID-NEXT:    sw a5, 20(sp)
+; RV32ID-NEXT:    call __trunctfhf2
+; RV32ID-NEXT:    fmv.x.w a0, fa0
+; RV32ID-NEXT:    sh a0, 0(s0)
+; RV32ID-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32ID-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32ID-NEXT:    addi sp, sp, 32
+; RV32ID-NEXT:    ret
+;
+; RV64ID-LABEL: fcvt_h_q:
+; RV64ID:       # %bb.0:
+; RV64ID-NEXT:    addi sp, sp, -16
+; RV64ID-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64ID-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64ID-NEXT:    mv s0, a2
+; RV64ID-NEXT:    call __trunctfhf2
+; RV64ID-NEXT:    fmv.x.w a0, fa0
+; RV64ID-NEXT:    sh a0, 0(s0)
+; RV64ID-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64ID-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64ID-NEXT:    addi sp, sp, 16
+; RV64ID-NEXT:    ret
+;
+; CHECK32-IZFHMIN-LABEL: fcvt_h_q:
+; CHECK32-IZFHMIN:       # %bb.0:
+; CHECK32-IZFHMIN-NEXT:    addi sp, sp, -32
+; CHECK32-IZFHMIN-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK32-IZFHMIN-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; CHECK32-IZFHMIN-NEXT:    lw a2, 0(a0)
+; CHECK32-IZFHMIN-NEXT:    lw a3, 4(a0)
+; CHECK32-IZFHMIN-NEXT:    lw a4, 8(a0)
+; CHECK32-IZFHMIN-NEXT:    lw a5, 12(a0)
+; CHECK32-IZFHMIN-NEXT:    mv s0, a1
+; CHECK32-IZFHMIN-NEXT:    addi a0, sp, 8
+; CHECK32-IZFHMIN-NEXT:    sw a2, 8(sp)
+; CHECK32-IZFHMIN-NEXT:    sw a3, 12(sp)
+; CHECK32-IZFHMIN-NEXT:    sw a4, 16(sp)
+; CHECK32-IZFHMIN-NEXT:    sw a5, 20(sp)
+; CHECK32-IZFHMIN-NEXT:    call __trunctfhf2
+; CHECK32-IZFHMIN-NEXT:    fsh fa0, 0(s0)
+; CHECK32-IZFHMIN-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK32-IZFHMIN-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; CHECK32-IZFHMIN-NEXT:    addi sp, sp, 32
+; CHECK32-IZFHMIN-NEXT:    ret
+;
+; CHECK64-IZFHMIN-LABEL: fcvt_h_q:
+; CHECK64-IZFHMIN:       # %bb.0:
+; CHECK64-IZFHMIN-NEXT:    addi sp, sp, -16
+; CHECK64-IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; CHECK64-IZFHMIN-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; CHECK64-IZFHMIN-NEXT:    mv s0, a2
+; CHECK64-IZFHMIN-NEXT:    call __trunctfhf2
+; CHECK64-IZFHMIN-NEXT:    fsh fa0, 0(s0)
+; CHECK64-IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; CHECK64-IZFHMIN-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; CHECK64-IZFHMIN-NEXT:    addi sp, sp, 16
+; CHECK64-IZFHMIN-NEXT:    ret
+;
+; CHECK32-IZHINXMIN-LABEL: fcvt_h_q:
+; CHECK32-IZHINXMIN:       # %bb.0:
+; CHECK32-IZHINXMIN-NEXT:    addi sp, sp, -32
+; CHECK32-IZHINXMIN-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK32-IZHINXMIN-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; CHECK32-IZHINXMIN-NEXT:    lw a2, 0(a0)
+; CHECK32-IZHINXMIN-NEXT:    lw a3, 4(a0)
+; CHECK32-IZHINXMIN-NEXT:    lw a4, 8(a0)
+; CHECK32-IZHINXMIN-NEXT:    lw a5, 12(a0)
+; CHECK32-IZHINXMIN-NEXT:    mv s0, a1
+; CHECK32-IZHINXMIN-NEXT:    addi a0, sp, 8
+; CHECK32-IZHINXMIN-NEXT:    sw a2, 8(sp)
+; CHECK32-IZHINXMIN-NEXT:    sw a3, 12(sp)
+; CHECK32-IZHINXMIN-NEXT:    sw a4, 16(sp)
+; CHECK32-IZHINXMIN-NEXT:    sw a5, 20(sp)
+; CHECK32-IZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK32-IZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK32-IZHINXMIN-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK32-IZHINXMIN-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; CHECK32-IZHINXMIN-NEXT:    addi sp, sp, 32
+; CHECK32-IZHINXMIN-NEXT:    ret
+;
+; CHECK64-IZHINXMIN-LABEL: fcvt_h_q:
+; CHECK64-IZHINXMIN:       # %bb.0:
+; CHECK64-IZHINXMIN-NEXT:    addi sp, sp, -16
+; CHECK64-IZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; CHECK64-IZHINXMIN-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; CHECK64-IZHINXMIN-NEXT:    mv s0, a2
+; CHECK64-IZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK64-IZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK64-IZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; CHECK64-IZHINXMIN-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; CHECK64-IZHINXMIN-NEXT:    addi sp, sp, 16
+; CHECK64-IZHINXMIN-NEXT:    ret
+;
+; CHECK32-IZDINXZHINXMIN-LABEL: fcvt_h_q:
+; CHECK32-IZDINXZHINXMIN:       # %bb.0:
+; CHECK32-IZDINXZHINXMIN-NEXT:    addi sp, sp, -32
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a2, 0(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a3, 4(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a4, 8(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a5, 12(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    mv s0, a1
+; CHECK32-IZDINXZHINXMIN-NEXT:    addi a0, sp, 8
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a2, 8(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a3, 12(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a4, 16(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a5, 20(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK32-IZDINXZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; CHECK32-IZDINXZHINXMIN-NEXT:    addi sp, sp, 32
+; CHECK32-IZDINXZHINXMIN-NEXT:    ret
+;
+; CHECK64-IZDINXZHINXMIN-LABEL: fcvt_h_q:
+; CHECK64-IZDINXZHINXMIN:       # %bb.0:
+; CHECK64-IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
+; CHECK64-IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; CHECK64-IZDINXZHINXMIN-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; CHECK64-IZDINXZHINXMIN-NEXT:    mv s0, a2
+; CHECK64-IZDINXZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK64-IZDINXZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK64-IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; CHECK64-IZDINXZHINXMIN-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; CHECK64-IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
+; CHECK64-IZDINXZHINXMIN-NEXT:    ret
+  %a = fptrunc fp128 %x to half
+  store half %a, ptr %y
+  ret void
+}
+
+; Test lib call lowering for half->fp128 to make sure it follows the ABI.
+define fp128 @fcvt_q_h(ptr %x) {
+; RV32IZFH-LABEL: fcvt_q_h:
+; RV32IZFH:       # %bb.0:
+; RV32IZFH-NEXT:    addi sp, sp, -32
+; RV32IZFH-NEXT:    .cfi_def_cfa_offset 32
+; RV32IZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    .cfi_offset ra, -4
+; RV32IZFH-NEXT:    .cfi_offset s0, -8
+; RV32IZFH-NEXT:    mv s0, a0
+; RV32IZFH-NEXT:    flh fa5, 0(a1)
+; RV32IZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV32IZFH-NEXT:    addi a0, sp, 8
+; RV32IZFH-NEXT:    call __extendsftf2
+; RV32IZFH-NEXT:    lw a0, 8(sp)
+; RV32IZFH-NEXT:    lw a1, 12(sp)
+; RV32IZFH-NEXT:    lw a2, 16(sp)
+; RV32IZFH-NEXT:    lw a3, 20(sp)
+; RV32IZFH-NEXT:    sw a0, 0(s0)
+; RV32IZFH-NEXT:    sw a1, 4(s0)
+; RV32IZFH-NEXT:    sw a2, 8(s0)
+; RV32IZFH-NEXT:    sw a3, 12(s0)
+; RV32IZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    .cfi_restore ra
+; RV32IZFH-NEXT:    .cfi_restore s0
+; RV32IZFH-NEXT:    addi sp, sp, 32
+; RV32IZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV32IZFH-NEXT:    ret
+;
+; RV64IZFH-LABEL: fcvt_q_h:
+; RV64IZFH:       # %bb.0:
+; RV64IZFH-NEXT:    addi sp, sp, -16
+; RV64IZFH-NEXT:    .cfi_def_cfa_offset 16
+; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZFH-NEXT:    .cfi_offset ra, -8
+; RV64IZFH-NEXT:    flh fa5, 0(a0)
+; RV64IZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV64IZFH-NEXT:    call __extendsftf2
+; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZFH-NEXT:    .cfi_restore ra
+; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV64IZFH-NEXT:    ret
+;
+; RV32IDZFH-LABEL: fcvt_q_h:
+; RV32IDZFH:       # %bb.0:
+; RV32IDZFH-NEXT:    addi sp, sp, -32
+; RV32IDZFH-NEXT:    .cfi_def_cfa_offset 32
+; RV32IDZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    .cfi_offset ra, -4
+; RV32IDZFH-NEXT:    .cfi_offset s0, -8
+; RV32IDZFH-NEXT:    mv s0, a0
+; RV32IDZFH-NEXT:    flh fa5, 0(a1)
+; RV32IDZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV32IDZFH-NEXT:    addi a0, sp, 8
+; RV32IDZFH-NEXT:    call __extendsftf2
+; RV32IDZFH-NEXT:    lw a0, 8(sp)
+; RV32IDZFH-NEXT:    lw a1, 12(sp)
+; RV32IDZFH-NEXT:    lw a2, 16(sp)
+; RV32IDZFH-NEXT:    lw a3, 20(sp)
+; RV32IDZFH-NEXT:    sw a0, 0(s0)
+; RV32IDZFH-NEXT:    sw a1, 4(s0)
+; RV32IDZFH-NEXT:    sw a2, 8(s0)
+; RV32IDZFH-NEXT:    sw a3, 12(s0)
+; RV32IDZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    .cfi_restore ra
+; RV32IDZFH-NEXT:    .cfi_restore s0
+; RV32IDZFH-NEXT:    addi sp, sp, 32
+; RV32IDZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV32IDZFH-NEXT:    ret
+;
+; RV64IDZFH-LABEL: fcvt_q_h:
+; RV64IDZFH:       # %bb.0:
+; RV64IDZFH-NEXT:    addi sp, sp, -16
+; RV64IDZFH-NEXT:    .cfi_def_cfa_offset 16
+; RV64IDZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IDZFH-NEXT:    .cfi_offset ra, -8
+; RV64IDZFH-NEXT:    flh fa5, 0(a0)
+; RV64IDZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV64IDZFH-NEXT:    call __extendsftf2
+; RV64IDZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IDZFH-NEXT:    .cfi_restore ra
+; RV64IDZFH-NEXT:    addi sp, sp, 16
+; RV64IDZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV64IDZFH-NEXT:    ret
+;
+; RV32IZHINX-LABEL: fcvt_q_h:
+; RV32IZHINX:       # %bb.0:
+; RV32IZHINX-NEXT:    addi sp, sp, -32
+; RV32IZHINX-NEXT:    .cfi_def_cfa_offset 32
+; RV32IZHINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    .cfi_offset ra, -4
+; RV32IZHINX-NEXT:    .cfi_offset s0, -8
+; RV32IZHINX-NEXT:    mv s0, a0
+; RV32IZHINX-NEXT:    lh a0, 0(a1)
+; RV32IZHINX-NEXT:    fcvt.s.h a1, a0
+; RV32IZHINX-NEXT:    addi a0, sp, 8
+; RV32IZHINX-NEXT:    call __extendsftf2
+; RV32IZHINX-NEXT:    lw a0, 8(sp)
+; RV32IZHINX-NEXT:    lw a1, 12(sp)
+; RV32IZHINX-NEXT:    lw a2, 16(sp)
+; RV32IZHINX-NEXT:    lw a3, 20(sp)
+; RV32IZHINX-NEXT:    sw a0, 0(s0)
+; RV32IZHINX-NEXT:    sw a1, 4(s0)
+; RV32IZHINX-NEXT:    sw a2, 8(s0...
[truncated]

phoebewang

LGTM, though it's a bit confusing to see a BF16 test affected by SoftPromoteHalfRes_FP_ROUND.

phoebewang · 2024-12-11T02:01:39Z

llvm/test/CodeGen/X86/bfloat.ll

-; X86-NEXT:    # kill: def $ax killed $ax def $eax
-; X86-NEXT:    vmovw %eax, %xmm0


Thanks! I should have noticed the problem during the review. The interesting thing is the fp80 -> bf16 is correct..

arsenm · 2024-12-11T02:20:08Z

llvm/test/CodeGen/RISCV/half-convert.ll

+  %a = load half, ptr %x
+  %b = fpext half %a to fp128
+  ret fp128 %b
+}


Test a strictfp case?

topperc · 2024-12-11T02:22:19Z

LGTM, though it's a bit confusing to see a BF16 test affected by SoftPromoteHalfRes_FP_ROUND.

SoftPromoteHalf was added years ago before bfloat was added. Maybe we should rename it now that it's used for bfloat too?

llvm-ci · 2024-12-11T07:14:07Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/10036

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: sanitizer/kernel_crash_async.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# RUN: at line 3
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# RUN: at line 4
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# RUN: at line 5
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -g
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -g
# RUN: at line 6
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# RUN: at line 7
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c:39:11: error: CHECK: expected string not found in input
# | // CHECK: Kernel {{[0-9]}}: {{.*}} (__omp_offloading_{{.*}}_main_l29)
# |           ^
# | <stdin>:1:1: note: scanning from here
# | Display only launched kernel:
# | ^
# | <stdin>:2:16: note: possible intended match here
# | Kernel 'omp target in main @ 29 (__omp_offloading_802_d8283c2_main_l29)'
# |                ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Display only launched kernel: 
# | check:39'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: Kernel 'omp target in main @ 29 (__omp_offloading_802_d8283c2_main_l29)' 
# | check:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | check:39'1                    ?                                                          possible intended match
...

This patch updates the riscv entrypoints with almost all functions that are currently supported on x86. I left six functions commented, as I'll send separate PRs to enable them: * poll: needs a new syscall implementation * utimes: needs a new syscall implementation * setitimer: test never finished in rv32 * getitimer: test fails in rv32 * exp10m1f: test case doesn't build but it's an easy fix * sqrtulk: needs to be implemented I also added the 16-bit fp functions, however, they are not enabled due to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we need to update the buildbots with a clang version that includes the fix from commit PR llvm#119481

This patch updates the riscv entrypoints with almost all functions that are currently supported on x86. I left six functions commented, as I'll send separate PRs to enable them: * poll: needs a new syscall implementation * utimes: needs a new syscall implementation * setitimer: test never finished in rv32 * getitimer: test fails in rv32 * exp10m1f: test case doesn't build but it's an easy fix * sqrtulk: needs to be implemented I also added the 16-bit fp functions, however, they are not enabled due to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we need to update the buildbots with a clang version that includes the fix from commit PR #119481

This patch updates the riscv entrypoints with almost all functions that are currently supported on x86. I left six functions commented, as I'll send separate PRs to enable them: * poll: needs a new syscall implementation * utimes: needs a new syscall implementation * setitimer: test never finished in rv32 * getitimer: test fails in rv32 * exp10m1f: test case doesn't build but it's an easy fix * sqrtulk: needs to be implemented I also added the 16-bit fp functions, however, they are not enabled due to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we need to update the buildbots with a clang version that includes the fix from commit PR llvm#119481

topperc requested review from arsenm, RKSimon and phoebewang December 11, 2024 01:33

topperc changed the title ~~[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromo…~~ [LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. Dec 11, 2024

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Dec 11, 2024

phoebewang approved these changes Dec 11, 2024

View reviewed changes

arsenm reviewed Dec 11, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/half-convert.ll

%a = load half, ptr %x

%b = fpext half %a to fp128

ret fp128 %b

}

Copy link

Contributor

arsenm Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test a strictfp case?

fixup! Add strictfp test.

63c9b7e

arsenm approved these changes Dec 11, 2024

View reviewed changes

topperc merged commit 839c821 into llvm:main Dec 11, 2024
6 of 8 checks passed

topperc deleted the pr/f16trunc branch December 11, 2024 06:21

mikhailramalho mentioned this pull request May 5, 2025

[libc] Update riscv entrypoints to be on par with x86 #138597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. #119481

[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. #119481

Uh oh!

topperc commented Dec 11, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 11, 2024 •

edited

Loading

Uh oh!

phoebewang left a comment

Uh oh!

phoebewang Dec 11, 2024

Uh oh!

arsenm Dec 11, 2024

Uh oh!

topperc commented Dec 11, 2024

Uh oh!

Uh oh!

llvm-ci commented Dec 11, 2024

Uh oh!

Uh oh!

		; X86-NEXT: # kill: def $ax killed $ax def $eax
		; X86-NEXT: vmovw %eax, %xmm0

[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. #119481

[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. #119481

Uh oh!

Conversation

topperc commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

phoebewang Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

topperc commented Dec 11, 2024

Uh oh!

Uh oh!

llvm-ci commented Dec 11, 2024

Uh oh!

Uh oh!

topperc commented Dec 11, 2024 •

edited

Loading

llvmbot commented Dec 11, 2024 •

edited

Loading