Skip to content

[LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. #119481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 11, 2024

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Dec 11, 2024

Previously we created an FP_TO_FP16 and legalized it in SoftenFloatOp_FP_ROUND. This caused i16 to be sent to call lowering instead of f16. This results in the ABI not being followed if f16 is supposed to be passed in a different register than i16.

Looking at the libgcc code for the library function it appears the value is returned in xmm0 so the X86 test was being miscompiled before.

Fixes #107607.

…teHalfRes_FP_ROUND if the input type is softened.

Previously we created an FP_TO_FP16 and legalized it in SoftenFloatOp_FP_ROUND.
This caused i16 to be sent to call lowering instead of f16. This
results in the ABI not being followed if f16 is supposed to be passed
in a different register than i16.

Looking at the libgcc code for the library function it appears the
value is returned in xmm0 so the X86 test was being miscompiled before.

Fixes llvm#107607.
@topperc topperc changed the title [LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromo… [LegalizeTypes][RISCV][X86] Legalize FP_ROUND to libcall in SoftPromoteHalfRes_FP_ROUND if the input type is softened. Dec 11, 2024
@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Dec 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Craig Topper (topperc)

Changes

Previously we created an FP_TO_FP16 and legalized it in SoftenFloatOp_FP_ROUND. This caused i16 to be sent to call lowering instead of f16. This results in the ABI not being followed if f16 is supposed to be passed in a different register than i16.

Looking at the libgcc code for the library function it appears the value is returned in xmm0 so the X86 test was being miscompiled before.

Fixes #107607.


Patch is 34.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119481.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp (+17)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+808)
  • (modified) llvm/test/CodeGen/X86/bfloat.ll (-2)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index b52c2c07a7fba0..71f100bfa03434 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -3415,6 +3415,23 @@ SDValue DAGTypeLegalizer::SoftPromoteHalfRes_FP_ROUND(SDNode *N) {
   SDValue Op = N->getOperand(IsStrict ? 1 : 0);
   EVT SVT = Op.getValueType();
 
+  // If the input type needs to be softened, do that now so that call lowering
+  // will see the f16 type.
+  if (getTypeAction(SVT) == TargetLowering::TypeSoftenFloat) {
+    RTLIB::Libcall LC = RTLIB::getFPROUND(SVT, RVT);
+    assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported FP_ROUND libcall");
+
+    SDValue Chain = IsStrict ? N->getOperand(0) : SDValue();
+    Op = GetSoftenedFloat(Op);
+    TargetLowering::MakeLibCallOptions CallOptions;
+    CallOptions.setTypeListBeforeSoften(SVT, RVT, true);
+    std::pair<SDValue, SDValue> Tmp =
+        TLI.makeLibCall(DAG, LC, RVT, Op, CallOptions, SDLoc(N), Chain);
+    if (IsStrict)
+      ReplaceValueWith(SDValue(N, 1), Tmp.second);
+    return DAG.getNode(ISD::BITCAST, SDLoc(N), MVT::i16, Tmp.first);
+  }
+
   if (IsStrict) {
     SDValue Res = DAG.getNode(GetPromotionOpcodeStrict(SVT, RVT), SDLoc(N),
                               {MVT::i16, MVT::Other}, {N->getOperand(0), Op});
diff --git a/llvm/test/CodeGen/RISCV/half-convert.ll b/llvm/test/CodeGen/RISCV/half-convert.ll
index 01ffcab1a6556f..cf57ecd6cd1e43 100644
--- a/llvm/test/CodeGen/RISCV/half-convert.ll
+++ b/llvm/test/CodeGen/RISCV/half-convert.ll
@@ -8587,3 +8587,811 @@ start:
   %0 = tail call i32 @llvm.fptosi.sat.i32.f16(half %a)
   ret i32 %0
 }
+
+; Test lib call lowering for fp128->half to make sure it matches the ABI.
+define void @fcvt_h_q(fp128 %x, ptr %y) nounwind {
+; RV32IZFH-LABEL: fcvt_h_q:
+; RV32IZFH:       # %bb.0:
+; RV32IZFH-NEXT:    addi sp, sp, -32
+; RV32IZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    lw a2, 0(a0)
+; RV32IZFH-NEXT:    lw a3, 4(a0)
+; RV32IZFH-NEXT:    lw a4, 8(a0)
+; RV32IZFH-NEXT:    lw a5, 12(a0)
+; RV32IZFH-NEXT:    mv s0, a1
+; RV32IZFH-NEXT:    addi a0, sp, 8
+; RV32IZFH-NEXT:    sw a2, 8(sp)
+; RV32IZFH-NEXT:    sw a3, 12(sp)
+; RV32IZFH-NEXT:    sw a4, 16(sp)
+; RV32IZFH-NEXT:    sw a5, 20(sp)
+; RV32IZFH-NEXT:    call __trunctfhf2
+; RV32IZFH-NEXT:    fsh fa0, 0(s0)
+; RV32IZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    addi sp, sp, 32
+; RV32IZFH-NEXT:    ret
+;
+; RV64IZFH-LABEL: fcvt_h_q:
+; RV64IZFH:       # %bb.0:
+; RV64IZFH-NEXT:    addi sp, sp, -16
+; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZFH-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IZFH-NEXT:    mv s0, a2
+; RV64IZFH-NEXT:    call __trunctfhf2
+; RV64IZFH-NEXT:    fsh fa0, 0(s0)
+; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZFH-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    ret
+;
+; RV32IDZFH-LABEL: fcvt_h_q:
+; RV32IDZFH:       # %bb.0:
+; RV32IDZFH-NEXT:    addi sp, sp, -32
+; RV32IDZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    lw a2, 0(a0)
+; RV32IDZFH-NEXT:    lw a3, 4(a0)
+; RV32IDZFH-NEXT:    lw a4, 8(a0)
+; RV32IDZFH-NEXT:    lw a5, 12(a0)
+; RV32IDZFH-NEXT:    mv s0, a1
+; RV32IDZFH-NEXT:    addi a0, sp, 8
+; RV32IDZFH-NEXT:    sw a2, 8(sp)
+; RV32IDZFH-NEXT:    sw a3, 12(sp)
+; RV32IDZFH-NEXT:    sw a4, 16(sp)
+; RV32IDZFH-NEXT:    sw a5, 20(sp)
+; RV32IDZFH-NEXT:    call __trunctfhf2
+; RV32IDZFH-NEXT:    fsh fa0, 0(s0)
+; RV32IDZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    addi sp, sp, 32
+; RV32IDZFH-NEXT:    ret
+;
+; RV64IDZFH-LABEL: fcvt_h_q:
+; RV64IDZFH:       # %bb.0:
+; RV64IDZFH-NEXT:    addi sp, sp, -16
+; RV64IDZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IDZFH-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IDZFH-NEXT:    mv s0, a2
+; RV64IDZFH-NEXT:    call __trunctfhf2
+; RV64IDZFH-NEXT:    fsh fa0, 0(s0)
+; RV64IDZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IDZFH-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IDZFH-NEXT:    addi sp, sp, 16
+; RV64IDZFH-NEXT:    ret
+;
+; RV32IZHINX-LABEL: fcvt_h_q:
+; RV32IZHINX:       # %bb.0:
+; RV32IZHINX-NEXT:    addi sp, sp, -32
+; RV32IZHINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    lw a2, 0(a0)
+; RV32IZHINX-NEXT:    lw a3, 4(a0)
+; RV32IZHINX-NEXT:    lw a4, 8(a0)
+; RV32IZHINX-NEXT:    lw a5, 12(a0)
+; RV32IZHINX-NEXT:    mv s0, a1
+; RV32IZHINX-NEXT:    addi a0, sp, 8
+; RV32IZHINX-NEXT:    sw a2, 8(sp)
+; RV32IZHINX-NEXT:    sw a3, 12(sp)
+; RV32IZHINX-NEXT:    sw a4, 16(sp)
+; RV32IZHINX-NEXT:    sw a5, 20(sp)
+; RV32IZHINX-NEXT:    call __trunctfhf2
+; RV32IZHINX-NEXT:    sh a0, 0(s0)
+; RV32IZHINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZHINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZHINX-NEXT:    addi sp, sp, 32
+; RV32IZHINX-NEXT:    ret
+;
+; RV64IZHINX-LABEL: fcvt_h_q:
+; RV64IZHINX:       # %bb.0:
+; RV64IZHINX-NEXT:    addi sp, sp, -16
+; RV64IZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZHINX-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IZHINX-NEXT:    mv s0, a2
+; RV64IZHINX-NEXT:    call __trunctfhf2
+; RV64IZHINX-NEXT:    sh a0, 0(s0)
+; RV64IZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZHINX-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IZHINX-NEXT:    addi sp, sp, 16
+; RV64IZHINX-NEXT:    ret
+;
+; RV32IZDINXZHINX-LABEL: fcvt_h_q:
+; RV32IZDINXZHINX:       # %bb.0:
+; RV32IZDINXZHINX-NEXT:    addi sp, sp, -32
+; RV32IZDINXZHINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZDINXZHINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZDINXZHINX-NEXT:    lw a2, 0(a0)
+; RV32IZDINXZHINX-NEXT:    lw a3, 4(a0)
+; RV32IZDINXZHINX-NEXT:    lw a4, 8(a0)
+; RV32IZDINXZHINX-NEXT:    lw a5, 12(a0)
+; RV32IZDINXZHINX-NEXT:    mv s0, a1
+; RV32IZDINXZHINX-NEXT:    addi a0, sp, 8
+; RV32IZDINXZHINX-NEXT:    sw a2, 8(sp)
+; RV32IZDINXZHINX-NEXT:    sw a3, 12(sp)
+; RV32IZDINXZHINX-NEXT:    sw a4, 16(sp)
+; RV32IZDINXZHINX-NEXT:    sw a5, 20(sp)
+; RV32IZDINXZHINX-NEXT:    call __trunctfhf2
+; RV32IZDINXZHINX-NEXT:    sh a0, 0(s0)
+; RV32IZDINXZHINX-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZDINXZHINX-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZDINXZHINX-NEXT:    addi sp, sp, 32
+; RV32IZDINXZHINX-NEXT:    ret
+;
+; RV64IZDINXZHINX-LABEL: fcvt_h_q:
+; RV64IZDINXZHINX:       # %bb.0:
+; RV64IZDINXZHINX-NEXT:    addi sp, sp, -16
+; RV64IZDINXZHINX-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZDINXZHINX-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64IZDINXZHINX-NEXT:    mv s0, a2
+; RV64IZDINXZHINX-NEXT:    call __trunctfhf2
+; RV64IZDINXZHINX-NEXT:    sh a0, 0(s0)
+; RV64IZDINXZHINX-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZDINXZHINX-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64IZDINXZHINX-NEXT:    addi sp, sp, 16
+; RV64IZDINXZHINX-NEXT:    ret
+;
+; RV32I-LABEL: fcvt_h_q:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    addi sp, sp, -32
+; RV32I-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    lw a2, 0(a0)
+; RV32I-NEXT:    lw a3, 4(a0)
+; RV32I-NEXT:    lw a4, 8(a0)
+; RV32I-NEXT:    lw a5, 12(a0)
+; RV32I-NEXT:    mv s0, a1
+; RV32I-NEXT:    addi a0, sp, 8
+; RV32I-NEXT:    sw a2, 8(sp)
+; RV32I-NEXT:    sw a3, 12(sp)
+; RV32I-NEXT:    sw a4, 16(sp)
+; RV32I-NEXT:    sw a5, 20(sp)
+; RV32I-NEXT:    call __trunctfhf2
+; RV32I-NEXT:    sh a0, 0(s0)
+; RV32I-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    addi sp, sp, 32
+; RV32I-NEXT:    ret
+;
+; RV64I-LABEL: fcvt_h_q:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    addi sp, sp, -16
+; RV64I-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    mv s0, a2
+; RV64I-NEXT:    call __trunctfhf2
+; RV64I-NEXT:    sh a0, 0(s0)
+; RV64I-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    addi sp, sp, 16
+; RV64I-NEXT:    ret
+;
+; RV32ID-ILP32-LABEL: fcvt_h_q:
+; RV32ID-ILP32:       # %bb.0:
+; RV32ID-ILP32-NEXT:    addi sp, sp, -32
+; RV32ID-ILP32-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32ID-ILP32-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32ID-ILP32-NEXT:    lw a2, 0(a0)
+; RV32ID-ILP32-NEXT:    lw a3, 4(a0)
+; RV32ID-ILP32-NEXT:    lw a4, 8(a0)
+; RV32ID-ILP32-NEXT:    lw a5, 12(a0)
+; RV32ID-ILP32-NEXT:    mv s0, a1
+; RV32ID-ILP32-NEXT:    addi a0, sp, 8
+; RV32ID-ILP32-NEXT:    sw a2, 8(sp)
+; RV32ID-ILP32-NEXT:    sw a3, 12(sp)
+; RV32ID-ILP32-NEXT:    sw a4, 16(sp)
+; RV32ID-ILP32-NEXT:    sw a5, 20(sp)
+; RV32ID-ILP32-NEXT:    call __trunctfhf2
+; RV32ID-ILP32-NEXT:    sh a0, 0(s0)
+; RV32ID-ILP32-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32ID-ILP32-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32ID-ILP32-NEXT:    addi sp, sp, 32
+; RV32ID-ILP32-NEXT:    ret
+;
+; RV64ID-LP64-LABEL: fcvt_h_q:
+; RV64ID-LP64:       # %bb.0:
+; RV64ID-LP64-NEXT:    addi sp, sp, -16
+; RV64ID-LP64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64ID-LP64-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64ID-LP64-NEXT:    mv s0, a2
+; RV64ID-LP64-NEXT:    call __trunctfhf2
+; RV64ID-LP64-NEXT:    sh a0, 0(s0)
+; RV64ID-LP64-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64ID-LP64-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64ID-LP64-NEXT:    addi sp, sp, 16
+; RV64ID-LP64-NEXT:    ret
+;
+; RV32ID-LABEL: fcvt_h_q:
+; RV32ID:       # %bb.0:
+; RV32ID-NEXT:    addi sp, sp, -32
+; RV32ID-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32ID-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32ID-NEXT:    lw a2, 0(a0)
+; RV32ID-NEXT:    lw a3, 4(a0)
+; RV32ID-NEXT:    lw a4, 8(a0)
+; RV32ID-NEXT:    lw a5, 12(a0)
+; RV32ID-NEXT:    mv s0, a1
+; RV32ID-NEXT:    addi a0, sp, 8
+; RV32ID-NEXT:    sw a2, 8(sp)
+; RV32ID-NEXT:    sw a3, 12(sp)
+; RV32ID-NEXT:    sw a4, 16(sp)
+; RV32ID-NEXT:    sw a5, 20(sp)
+; RV32ID-NEXT:    call __trunctfhf2
+; RV32ID-NEXT:    fmv.x.w a0, fa0
+; RV32ID-NEXT:    sh a0, 0(s0)
+; RV32ID-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32ID-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32ID-NEXT:    addi sp, sp, 32
+; RV32ID-NEXT:    ret
+;
+; RV64ID-LABEL: fcvt_h_q:
+; RV64ID:       # %bb.0:
+; RV64ID-NEXT:    addi sp, sp, -16
+; RV64ID-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64ID-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; RV64ID-NEXT:    mv s0, a2
+; RV64ID-NEXT:    call __trunctfhf2
+; RV64ID-NEXT:    fmv.x.w a0, fa0
+; RV64ID-NEXT:    sh a0, 0(s0)
+; RV64ID-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64ID-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; RV64ID-NEXT:    addi sp, sp, 16
+; RV64ID-NEXT:    ret
+;
+; CHECK32-IZFHMIN-LABEL: fcvt_h_q:
+; CHECK32-IZFHMIN:       # %bb.0:
+; CHECK32-IZFHMIN-NEXT:    addi sp, sp, -32
+; CHECK32-IZFHMIN-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK32-IZFHMIN-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; CHECK32-IZFHMIN-NEXT:    lw a2, 0(a0)
+; CHECK32-IZFHMIN-NEXT:    lw a3, 4(a0)
+; CHECK32-IZFHMIN-NEXT:    lw a4, 8(a0)
+; CHECK32-IZFHMIN-NEXT:    lw a5, 12(a0)
+; CHECK32-IZFHMIN-NEXT:    mv s0, a1
+; CHECK32-IZFHMIN-NEXT:    addi a0, sp, 8
+; CHECK32-IZFHMIN-NEXT:    sw a2, 8(sp)
+; CHECK32-IZFHMIN-NEXT:    sw a3, 12(sp)
+; CHECK32-IZFHMIN-NEXT:    sw a4, 16(sp)
+; CHECK32-IZFHMIN-NEXT:    sw a5, 20(sp)
+; CHECK32-IZFHMIN-NEXT:    call __trunctfhf2
+; CHECK32-IZFHMIN-NEXT:    fsh fa0, 0(s0)
+; CHECK32-IZFHMIN-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK32-IZFHMIN-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; CHECK32-IZFHMIN-NEXT:    addi sp, sp, 32
+; CHECK32-IZFHMIN-NEXT:    ret
+;
+; CHECK64-IZFHMIN-LABEL: fcvt_h_q:
+; CHECK64-IZFHMIN:       # %bb.0:
+; CHECK64-IZFHMIN-NEXT:    addi sp, sp, -16
+; CHECK64-IZFHMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; CHECK64-IZFHMIN-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; CHECK64-IZFHMIN-NEXT:    mv s0, a2
+; CHECK64-IZFHMIN-NEXT:    call __trunctfhf2
+; CHECK64-IZFHMIN-NEXT:    fsh fa0, 0(s0)
+; CHECK64-IZFHMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; CHECK64-IZFHMIN-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; CHECK64-IZFHMIN-NEXT:    addi sp, sp, 16
+; CHECK64-IZFHMIN-NEXT:    ret
+;
+; CHECK32-IZHINXMIN-LABEL: fcvt_h_q:
+; CHECK32-IZHINXMIN:       # %bb.0:
+; CHECK32-IZHINXMIN-NEXT:    addi sp, sp, -32
+; CHECK32-IZHINXMIN-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK32-IZHINXMIN-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; CHECK32-IZHINXMIN-NEXT:    lw a2, 0(a0)
+; CHECK32-IZHINXMIN-NEXT:    lw a3, 4(a0)
+; CHECK32-IZHINXMIN-NEXT:    lw a4, 8(a0)
+; CHECK32-IZHINXMIN-NEXT:    lw a5, 12(a0)
+; CHECK32-IZHINXMIN-NEXT:    mv s0, a1
+; CHECK32-IZHINXMIN-NEXT:    addi a0, sp, 8
+; CHECK32-IZHINXMIN-NEXT:    sw a2, 8(sp)
+; CHECK32-IZHINXMIN-NEXT:    sw a3, 12(sp)
+; CHECK32-IZHINXMIN-NEXT:    sw a4, 16(sp)
+; CHECK32-IZHINXMIN-NEXT:    sw a5, 20(sp)
+; CHECK32-IZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK32-IZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK32-IZHINXMIN-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK32-IZHINXMIN-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; CHECK32-IZHINXMIN-NEXT:    addi sp, sp, 32
+; CHECK32-IZHINXMIN-NEXT:    ret
+;
+; CHECK64-IZHINXMIN-LABEL: fcvt_h_q:
+; CHECK64-IZHINXMIN:       # %bb.0:
+; CHECK64-IZHINXMIN-NEXT:    addi sp, sp, -16
+; CHECK64-IZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; CHECK64-IZHINXMIN-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; CHECK64-IZHINXMIN-NEXT:    mv s0, a2
+; CHECK64-IZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK64-IZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK64-IZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; CHECK64-IZHINXMIN-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; CHECK64-IZHINXMIN-NEXT:    addi sp, sp, 16
+; CHECK64-IZHINXMIN-NEXT:    ret
+;
+; CHECK32-IZDINXZHINXMIN-LABEL: fcvt_h_q:
+; CHECK32-IZDINXZHINXMIN:       # %bb.0:
+; CHECK32-IZDINXZHINXMIN-NEXT:    addi sp, sp, -32
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a2, 0(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a3, 4(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a4, 8(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw a5, 12(a0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    mv s0, a1
+; CHECK32-IZDINXZHINXMIN-NEXT:    addi a0, sp, 8
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a2, 8(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a3, 12(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a4, 16(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    sw a5, 20(sp)
+; CHECK32-IZDINXZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK32-IZDINXZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; CHECK32-IZDINXZHINXMIN-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; CHECK32-IZDINXZHINXMIN-NEXT:    addi sp, sp, 32
+; CHECK32-IZDINXZHINXMIN-NEXT:    ret
+;
+; CHECK64-IZDINXZHINXMIN-LABEL: fcvt_h_q:
+; CHECK64-IZDINXZHINXMIN:       # %bb.0:
+; CHECK64-IZDINXZHINXMIN-NEXT:    addi sp, sp, -16
+; CHECK64-IZDINXZHINXMIN-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; CHECK64-IZDINXZHINXMIN-NEXT:    sd s0, 0(sp) # 8-byte Folded Spill
+; CHECK64-IZDINXZHINXMIN-NEXT:    mv s0, a2
+; CHECK64-IZDINXZHINXMIN-NEXT:    call __trunctfhf2
+; CHECK64-IZDINXZHINXMIN-NEXT:    sh a0, 0(s0)
+; CHECK64-IZDINXZHINXMIN-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; CHECK64-IZDINXZHINXMIN-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
+; CHECK64-IZDINXZHINXMIN-NEXT:    addi sp, sp, 16
+; CHECK64-IZDINXZHINXMIN-NEXT:    ret
+  %a = fptrunc fp128 %x to half
+  store half %a, ptr %y
+  ret void
+}
+
+; Test lib call lowering for half->fp128 to make sure it follows the ABI.
+define fp128 @fcvt_q_h(ptr %x) {
+; RV32IZFH-LABEL: fcvt_q_h:
+; RV32IZFH:       # %bb.0:
+; RV32IZFH-NEXT:    addi sp, sp, -32
+; RV32IZFH-NEXT:    .cfi_def_cfa_offset 32
+; RV32IZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZFH-NEXT:    .cfi_offset ra, -4
+; RV32IZFH-NEXT:    .cfi_offset s0, -8
+; RV32IZFH-NEXT:    mv s0, a0
+; RV32IZFH-NEXT:    flh fa5, 0(a1)
+; RV32IZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV32IZFH-NEXT:    addi a0, sp, 8
+; RV32IZFH-NEXT:    call __extendsftf2
+; RV32IZFH-NEXT:    lw a0, 8(sp)
+; RV32IZFH-NEXT:    lw a1, 12(sp)
+; RV32IZFH-NEXT:    lw a2, 16(sp)
+; RV32IZFH-NEXT:    lw a3, 20(sp)
+; RV32IZFH-NEXT:    sw a0, 0(s0)
+; RV32IZFH-NEXT:    sw a1, 4(s0)
+; RV32IZFH-NEXT:    sw a2, 8(s0)
+; RV32IZFH-NEXT:    sw a3, 12(s0)
+; RV32IZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IZFH-NEXT:    .cfi_restore ra
+; RV32IZFH-NEXT:    .cfi_restore s0
+; RV32IZFH-NEXT:    addi sp, sp, 32
+; RV32IZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV32IZFH-NEXT:    ret
+;
+; RV64IZFH-LABEL: fcvt_q_h:
+; RV64IZFH:       # %bb.0:
+; RV64IZFH-NEXT:    addi sp, sp, -16
+; RV64IZFH-NEXT:    .cfi_def_cfa_offset 16
+; RV64IZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IZFH-NEXT:    .cfi_offset ra, -8
+; RV64IZFH-NEXT:    flh fa5, 0(a0)
+; RV64IZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV64IZFH-NEXT:    call __extendsftf2
+; RV64IZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IZFH-NEXT:    .cfi_restore ra
+; RV64IZFH-NEXT:    addi sp, sp, 16
+; RV64IZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV64IZFH-NEXT:    ret
+;
+; RV32IDZFH-LABEL: fcvt_q_h:
+; RV32IDZFH:       # %bb.0:
+; RV32IDZFH-NEXT:    addi sp, sp, -32
+; RV32IDZFH-NEXT:    .cfi_def_cfa_offset 32
+; RV32IDZFH-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IDZFH-NEXT:    .cfi_offset ra, -4
+; RV32IDZFH-NEXT:    .cfi_offset s0, -8
+; RV32IDZFH-NEXT:    mv s0, a0
+; RV32IDZFH-NEXT:    flh fa5, 0(a1)
+; RV32IDZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV32IDZFH-NEXT:    addi a0, sp, 8
+; RV32IDZFH-NEXT:    call __extendsftf2
+; RV32IDZFH-NEXT:    lw a0, 8(sp)
+; RV32IDZFH-NEXT:    lw a1, 12(sp)
+; RV32IDZFH-NEXT:    lw a2, 16(sp)
+; RV32IDZFH-NEXT:    lw a3, 20(sp)
+; RV32IDZFH-NEXT:    sw a0, 0(s0)
+; RV32IDZFH-NEXT:    sw a1, 4(s0)
+; RV32IDZFH-NEXT:    sw a2, 8(s0)
+; RV32IDZFH-NEXT:    sw a3, 12(s0)
+; RV32IDZFH-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IDZFH-NEXT:    .cfi_restore ra
+; RV32IDZFH-NEXT:    .cfi_restore s0
+; RV32IDZFH-NEXT:    addi sp, sp, 32
+; RV32IDZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV32IDZFH-NEXT:    ret
+;
+; RV64IDZFH-LABEL: fcvt_q_h:
+; RV64IDZFH:       # %bb.0:
+; RV64IDZFH-NEXT:    addi sp, sp, -16
+; RV64IDZFH-NEXT:    .cfi_def_cfa_offset 16
+; RV64IDZFH-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
+; RV64IDZFH-NEXT:    .cfi_offset ra, -8
+; RV64IDZFH-NEXT:    flh fa5, 0(a0)
+; RV64IDZFH-NEXT:    fcvt.s.h fa0, fa5
+; RV64IDZFH-NEXT:    call __extendsftf2
+; RV64IDZFH-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
+; RV64IDZFH-NEXT:    .cfi_restore ra
+; RV64IDZFH-NEXT:    addi sp, sp, 16
+; RV64IDZFH-NEXT:    .cfi_def_cfa_offset 0
+; RV64IDZFH-NEXT:    ret
+;
+; RV32IZHINX-LABEL: fcvt_q_h:
+; RV32IZHINX:       # %bb.0:
+; RV32IZHINX-NEXT:    addi sp, sp, -32
+; RV32IZHINX-NEXT:    .cfi_def_cfa_offset 32
+; RV32IZHINX-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IZHINX-NEXT:    .cfi_offset ra, -4
+; RV32IZHINX-NEXT:    .cfi_offset s0, -8
+; RV32IZHINX-NEXT:    mv s0, a0
+; RV32IZHINX-NEXT:    lh a0, 0(a1)
+; RV32IZHINX-NEXT:    fcvt.s.h a1, a0
+; RV32IZHINX-NEXT:    addi a0, sp, 8
+; RV32IZHINX-NEXT:    call __extendsftf2
+; RV32IZHINX-NEXT:    lw a0, 8(sp)
+; RV32IZHINX-NEXT:    lw a1, 12(sp)
+; RV32IZHINX-NEXT:    lw a2, 16(sp)
+; RV32IZHINX-NEXT:    lw a3, 20(sp)
+; RV32IZHINX-NEXT:    sw a0, 0(s0)
+; RV32IZHINX-NEXT:    sw a1, 4(s0)
+; RV32IZHINX-NEXT:    sw a2, 8(s0...
[truncated]

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though it's a bit confusing to see a BF16 test affected by SoftPromoteHalfRes_FP_ROUND.

Comment on lines -1986 to -1987
; X86-NEXT: # kill: def $ax killed $ax def $eax
; X86-NEXT: vmovw %eax, %xmm0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I should have noticed the problem during the review. The interesting thing is the fp80 -> bf16 is correct..

%a = load half, ptr %x
%b = fpext half %a to fp128
ret fp128 %b
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test a strictfp case?

@topperc
Copy link
Collaborator Author

topperc commented Dec 11, 2024

LGTM, though it's a bit confusing to see a BF16 test affected by SoftPromoteHalfRes_FP_ROUND.

SoftPromoteHalf was added years ago before bfloat was added. Maybe we should rename it now that it's used for bfloat too?

@topperc topperc merged commit 839c821 into llvm:main Dec 11, 2024
6 of 8 checks passed
@topperc topperc deleted the pr/f16trunc branch December 11, 2024 06:21
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 11, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/10036

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: sanitizer/kernel_crash_async.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# RUN: at line 3
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# RUN: at line 4
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# RUN: at line 5
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -g
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa -O3 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp -Xoffload-linker -lc -Xoffload-linker -lm /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -g
# RUN: at line 6
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=TRACE
# RUN: at line 7
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/not --crash /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/sanitizer/Output/kernel_crash_async.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c --check-prefixes=CHECK
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c:39:11: error: CHECK: expected string not found in input
# | // CHECK: Kernel {{[0-9]}}: {{.*}} (__omp_offloading_{{.*}}_main_l29)
# |           ^
# | <stdin>:1:1: note: scanning from here
# | Display only launched kernel:
# | ^
# | <stdin>:2:16: note: possible intended match here
# | Kernel 'omp target in main @ 29 (__omp_offloading_802_d8283c2_main_l29)'
# |                ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/sanitizer/kernel_crash_async.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Display only launched kernel: 
# | check:39'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: Kernel 'omp target in main @ 29 (__omp_offloading_802_d8283c2_main_l29)' 
# | check:39'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | check:39'1                    ?                                                          possible intended match
...

mikhailramalho added a commit to mikhailramalho/llvm-project that referenced this pull request May 4, 2025
This patch updates the riscv entrypoints with almost all functions that
are currently supported on x86. I left six functions commented, as I'll
send separate PRs to enable them:

* poll: needs a new syscall implementation
* utimes: needs a new syscall implementation
* setitimer: test never finished in rv32
* getitimer: test fails in rv32
* exp10m1f: test case doesn't build but it's an easy fix
* sqrtulk: needs to be implemented

I also added the 16-bit fp functions, however, they are not enabled due
to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we
need to update the buildbots with a clang version that includes the fix
from commit PR llvm#119481
mikhailramalho added a commit to mikhailramalho/llvm-project that referenced this pull request May 4, 2025
This patch updates the riscv entrypoints with almost all functions that
are currently supported on x86. I left six functions commented, as I'll
send separate PRs to enable them:

* poll: needs a new syscall implementation
* utimes: needs a new syscall implementation
* setitimer: test never finished in rv32
* getitimer: test fails in rv32
* exp10m1f: test case doesn't build but it's an easy fix
* sqrtulk: needs to be implemented

I also added the 16-bit fp functions, however, they are not enabled due
to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we
need to update the buildbots with a clang version that includes the fix
from commit PR llvm#119481
mikhailramalho added a commit to mikhailramalho/llvm-project that referenced this pull request May 5, 2025
This patch updates the riscv entrypoints with almost all functions that
are currently supported on x86. I left six functions commented, as I'll
send separate PRs to enable them:

* poll: needs a new syscall implementation
* utimes: needs a new syscall implementation
* setitimer: test never finished in rv32
* getitimer: test fails in rv32
* exp10m1f: test case doesn't build but it's an easy fix
* sqrtulk: needs to be implemented

I also added the 16-bit fp functions, however, they are not enabled due
to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we
need to update the buildbots with a clang version that includes the fix
from commit PR llvm#119481
mikhailramalho added a commit that referenced this pull request May 6, 2025
This patch updates the riscv entrypoints with almost all functions that
are currently supported on x86. I left six functions commented, as I'll
send separate PRs to enable them:

* poll: needs a new syscall implementation
* utimes: needs a new syscall implementation
* setitimer: test never finished in rv32
* getitimer: test fails in rv32
* exp10m1f: test case doesn't build but it's an easy fix
* sqrtulk: needs to be implemented

I also added the 16-bit fp functions, however, they are not enabled due
to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we
need to update the buildbots with a clang version that includes the fix
from commit PR #119481
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
This patch updates the riscv entrypoints with almost all functions that
are currently supported on x86. I left six functions commented, as I'll
send separate PRs to enable them:

* poll: needs a new syscall implementation
* utimes: needs a new syscall implementation
* setitimer: test never finished in rv32
* getitimer: test fails in rv32
* exp10m1f: test case doesn't build but it's an easy fix
* sqrtulk: needs to be implemented

I also added the 16-bit fp functions, however, they are not enabled due
to a cmake check in float16-macros.h. To enable 16-bit fp in riscv, we
need to update the buildbots with a clang version that includes the fix
from commit PR llvm#119481
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RISCV] Miscompilation on rv64
5 participants