-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Add float to tf32 conversion intrinsics #121507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Add float to tf32 conversion intrinsics #121507
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-nvptx Author: Durgadoss R (durga4github) ChangesThis patch adds an intrinsic to convert float to tf32.
TODO: We already have an intrinsic 'llvm.nvvm.f2tf32.rna' which caters only to one variant of the PTX instruction. Once this change lands, I will submit a follow-up PR to auto-upgrade it to use the generic variant. PTX Spec link: Full diff: https://github.com/llvm/llvm-project/pull/121507.diff 12 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 313e84f3722a95..f6d5d27b8850c9 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -462,6 +462,66 @@ to left-shift the found bit into the most-significant bit position, otherwise
the result is the shift amount needed to right-shift the found bit into the
least-significant bit position. 0xffffffff is returned if no 1 bit is found.
+Conversion Intrinsics (for cvt.* PTX instructions)
+--------------------------------------------------
+
+'``llvm.nvvm.cvt.float.to.tf32``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 %flag_fp_rnd_mode, i8 %flag_sat_mode, i1 %flag_relu)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.cvt.float.to.tf32``' intrinsic lowers to
+the ``cvt.*.tf32.f32`` set of PTX instructions.
+
+* The first argument is the input float to be converted to TF32.
+ This is followed by three flag arguments encoding the rounding mode,
+ saturation mode, and the relu modifier respectively.
+
+* The second argument (denoted by ``i8 %flag_fp_rnd_mode``) denotes
+ the floating-point rounding modes supported for this instruction.
+ This must be a compile-time constant and the encoding is as below:
+
+ ========== ==============
+ Enum Value Rounding Mode
+ ========== ==============
+ ``0`` NONE
+ ``1`` ROUND_RZ
+ ``2`` ROUND_RN
+ ``3`` ROUND_RP
+ ``4`` ROUND_RM
+ ``5`` ROUND_RNA
+ ========== ==============
+
+ The valid rounding modes are ``RNA, RN and RZ``.
+
+* The third argument (denoted by ``i8 %flag_sat_mode``) denotes the
+ saturation modifier for this intrinsic. As of now, it can either
+ be None or Satfinite, according to the enumeration below:
+
+ ========== ================
+ Enum Value Saturation Mode
+ ========== ================
+ ``0`` NONE
+ ``1`` SATFINITE
+ ========== ================
+
+* The last argument (denoted by ``i1 %flag_relu``) when set, generates
+ the ``.relu`` variant of the instruction.
+
+* Invalid values for the compile-time flag arguments may lead
+ to error(s) during Codegen.
+
+For more information, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt>`_.
+
TMA family of Intrinsics
------------------------
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index fd07d131ce15b2..870378bda44b0a 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1466,6 +1466,15 @@ let TargetPrefix = "nvvm" in {
def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">,
Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>;
+// Convert Float to TF32
+def int_nvvm_cvt_float_to_tf32 : Intrinsic<[llvm_i32_ty],
+ [llvm_float_ty, // Input float
+ llvm_i8_ty, // Flag for Rounding Modes
+ llvm_i8_ty, // Flag for Saturation Modes
+ llvm_i1_ty], // Flag for relu
+ [IntrNoMem, IntrNoCallback,
+ ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>]>;
+
// FNS
def int_nvvm_fns : ClangBuiltin<"__nvvm_fns">,
diff --git a/llvm/include/llvm/IR/NVVMIntrinsicFlags.h b/llvm/include/llvm/IR/NVVMIntrinsicFlags.h
index dfb6e857b3a6ad..3dfa58313e3b60 100644
--- a/llvm/include/llvm/IR/NVVMIntrinsicFlags.h
+++ b/llvm/include/llvm/IR/NVVMIntrinsicFlags.h
@@ -34,6 +34,22 @@ enum class TMAReductionOp : uint8_t {
XOR = 7,
};
+// Rounding Modes for floating point types
+enum class FPRoundingMode : uint8_t {
+ NONE = 0,
+ ROUND_RZ = 1, // roundTowardZero
+ ROUND_RN = 2, // roundToNearest-TiesToEven
+ ROUND_RP = 3, // roundTowardPositiveInf
+ ROUND_RM = 4, // roundTowardNegativeInf
+ ROUND_RNA = 5, // roundToNearest-TiesAwayFromZero
+};
+
+// Saturation Modes
+enum class SaturationMode : uint8_t {
+ NONE = 0,
+ SATFINITE = 1,
+};
+
} // namespace nvvm
} // namespace llvm
#endif // LLVM_IR_NVVMINTRINSICFLAGS_H
diff --git a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
index 65e1893d3f3bdf..06dc60da9e6462 100644
--- a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp
@@ -453,3 +453,56 @@ void NVPTXInstPrinter::printTmaReductionMode(const MCInst *MI, int OpNum,
llvm_unreachable(
"Invalid Reduction Op in printCpAsyncBulkTensorReductionMode");
}
+
+void NVPTXInstPrinter::printFPRoundingMode(const MCInst *MI, int OpNum,
+ raw_ostream &O,
+ const char *Modifier) {
+ const MCOperand &MO = MI->getOperand(OpNum);
+ using Mode = nvvm::FPRoundingMode;
+
+ switch (static_cast<Mode>(MO.getImm())) {
+ case Mode::NONE:
+ O << "";
+ return;
+ case Mode::ROUND_RN:
+ O << ".rn";
+ return;
+ case Mode::ROUND_RNA:
+ O << ".rna";
+ return;
+ case Mode::ROUND_RZ:
+ O << ".rz";
+ return;
+ case Mode::ROUND_RP:
+ O << ".rp";
+ return;
+ case Mode::ROUND_RM:
+ O << ".rm";
+ return;
+ }
+ llvm_unreachable("Invalid mode in printFPRoundingMode");
+}
+
+void NVPTXInstPrinter::printSaturationMode(const MCInst *MI, int OpNum,
+ raw_ostream &O,
+ const char *Modifier) {
+ const MCOperand &MO = MI->getOperand(OpNum);
+ using Mode = nvvm::SaturationMode;
+
+ switch (static_cast<Mode>(MO.getImm())) {
+ case Mode::NONE:
+ O << "";
+ return;
+ case Mode::SATFINITE:
+ O << ".satfinite";
+ return;
+ }
+ llvm_unreachable("Invalid mode in printSaturationMode");
+}
+
+void NVPTXInstPrinter::printReluModifier(const MCInst *MI, int OpNum,
+ raw_ostream &O, const char *Modifier) {
+ const MCOperand &MO = MI->getOperand(OpNum);
+ if (MO.getImm())
+ O << ".relu";
+}
diff --git a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.h b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.h
index 2b19386ef17fe5..7c3be27751ca14 100644
--- a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.h
+++ b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.h
@@ -56,6 +56,12 @@ class NVPTXInstPrinter : public MCInstPrinter {
const char *Modifier = nullptr);
void printTmaReductionMode(const MCInst *MI, int OpNum, raw_ostream &O,
const char *Modifier = nullptr);
+ void printFPRoundingMode(const MCInst *MI, int OpNum, raw_ostream &O,
+ const char *Modifier = nullptr);
+ void printSaturationMode(const MCInst *MI, int OpNum, raw_ostream &O,
+ const char *Modifier = nullptr);
+ void printReluModifier(const MCInst *MI, int OpNum, raw_ostream &O,
+ const char *Modifier = nullptr);
};
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index c51729e224bf54..82ac658c4a4570 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -728,7 +728,53 @@ bool NVPTXDAGToDAGISel::tryIntrinsicNoChain(SDNode *N) {
case Intrinsic::nvvm_texsurf_handle_internal:
SelectTexSurfHandle(N);
return true;
+ case Intrinsic::nvvm_cvt_float_to_tf32:
+ SelectCvtFloatToTF32(N);
+ return true;
+ }
+}
+
+void NVPTXDAGToDAGISel::SelectCvtFloatToTF32(SDNode *N) {
+ // 0 - IID
+ // 1 - Input Float
+ // 2 - Rounding Mode
+ // 3 - Saturation Mode
+ // 4 - Relu Flag
+ uint64_t Rnd = N->getConstantOperandVal(2);
+ uint64_t Sat = N->getConstantOperandVal(3);
+ bool IsRelu = N->getConstantOperandVal(4) == 1;
+
+ if (!Subtarget->hasTF32Math())
+ report_fatal_error("TF32 destination format requires at least sm80");
+
+ using SatMode = nvvm::SaturationMode;
+ bool IsSatFinite = static_cast<SatMode>(Sat) == SatMode::SATFINITE;
+ if (IsSatFinite && Subtarget->getPTXVersion() < 81)
+ report_fatal_error("satfinite modifier requires PTX version 8.1 or higher");
+
+ using RndMode = nvvm::FPRoundingMode;
+ switch (static_cast<RndMode>(Rnd)) {
+ case RndMode::ROUND_RNA:
+ if (IsRelu)
+ report_fatal_error("relu not supported with rna rounding mode");
+ break;
+ case RndMode::ROUND_RN:
+ case RndMode::ROUND_RZ: {
+ if (Subtarget->getSmVersion() < 90)
+ report_fatal_error("rn/rz rounding modes require at least sm90");
+ if (IsSatFinite)
+ report_fatal_error("satfinite not supported with rn/rz rounding modes");
+ break;
+ }
+ default:
+ report_fatal_error("Invalid FP rounding mode in SelectCvtFloatToTF32");
}
+
+ SDLoc DL(N);
+ SDValue Ops[] = {N->getOperand(1), getI32Imm(Rnd, DL), getI32Imm(Sat, DL),
+ getI32Imm(IsRelu, DL)};
+ ReplaceNode(N, CurDAG->getMachineNode(NVPTX::cvt_float_to_tf32, DL,
+ N->getVTList(), Ops));
}
void NVPTXDAGToDAGISel::SelectTexSurfHandle(SDNode *N) {
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
index c307f28fcc6c0a..3e22ef5bab9931 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
@@ -73,6 +73,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXDAGToDAGISel : public SelectionDAGISel {
bool tryIntrinsicChain(SDNode *N);
bool tryIntrinsicVoid(SDNode *N);
void SelectTexSurfHandle(SDNode *N);
+ void SelectCvtFloatToTF32(SDNode *N);
bool tryLoad(SDNode *N);
bool tryLoadVector(SDNode *N);
bool tryLDGLDU(SDNode *N);
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 8ede1ec4f20dc9..3274c1ef4260db 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -1802,6 +1802,22 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a),
def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a),
(CVT_f16x2_e5m2x2 $a, CvtRN_RELU)>;
+def FPRoundingMode : Operand<i32> {
+ let PrintMethod = "printFPRoundingMode";
+}
+
+def SatMode : Operand<i32> {
+ let PrintMethod = "printSaturationMode";
+}
+
+def ReluFlag : Operand<i32> {
+ let PrintMethod = "printReluModifier";
+}
+
+def cvt_float_to_tf32 : NVPTXInst<(outs Int32Regs:$dest),
+ (ins Float32Regs:$a, FPRoundingMode:$rnd, SatMode:$sat, ReluFlag:$relu),
+ "cvt${rnd:rnd}${sat:sat}${relu:relu}.tf32.f32 \t$dest, $a;", []>;
+
//
// FNS
//
diff --git a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
index 7555a2368ec963..9f0b437bd32dc5 100644
--- a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+++ b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
@@ -83,6 +83,7 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
bool hasFP16Math() const { return SmVersion >= 53; }
bool hasBF16Math() const { return SmVersion >= 80; }
bool allowFP16Math() const;
+ bool hasTF32Math() const { return SmVersion >= 80 && PTXVersion >= 70; }
bool hasMaskOperator() const { return PTXVersion >= 71; }
bool hasNoReturn() const { return SmVersion >= 30 && PTXVersion >= 64; }
// Does SM & PTX support memory orderings (weak and atomic: relaxed, acquire,
diff --git a/llvm/test/CodeGen/NVPTX/convert-sm80.ll b/llvm/test/CodeGen/NVPTX/convert-sm80.ll
index aebc28b1cfea3e..d54fa73d306ffa 100644
--- a/llvm/test/CodeGen/NVPTX/convert-sm80.ll
+++ b/llvm/test/CodeGen/NVPTX/convert-sm80.ll
@@ -261,3 +261,20 @@ define <2 x half> @fold_ff2f16x2(float %lo, float %hi) {
%v1 = insertelement <2 x half> %v0, half %hih, i64 1
ret <2 x half> %v1
}
+
+declare i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8, i8, i1)
+
+define i32 @cvt_rna_tf32_f32_flags(float %f1) {
+; CHECK-LABEL: cvt_rna_tf32_f32_flags(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .f32 %f<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f32 %f1, [cvt_rna_tf32_f32_flags_param_0];
+; CHECK-NEXT: cvt.rna.tf32.f32 %r1, %f1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %val = call i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 5, i8 0, i1 0)
+ ret i32 %val
+}
diff --git a/llvm/test/CodeGen/NVPTX/convert-sm89.ll b/llvm/test/CodeGen/NVPTX/convert-sm89.ll
index 5d0576aebbe089..3ff58b95348095 100644
--- a/llvm/test/CodeGen/NVPTX/convert-sm89.ll
+++ b/llvm/test/CodeGen/NVPTX/convert-sm89.ll
@@ -84,3 +84,12 @@ define <2 x half> @cvt_rn_relu_f16x2_e5m2x2(i16 %in) {
%val = call <2 x half> @llvm.nvvm.e5m2x2.to.f16x2.rn.relu(i16 %in);
ret <2 x half> %val
}
+
+declare i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8, i8, i1)
+
+; CHECK-LABEL: cvt_rna_satfinite_tf32_f32
+define i32 @cvt_rna_satfinite_tf32_f32(float %f1) {
+; CHECK: cvt.rna.satfinite.tf32.f32
+ %val = call i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 5, i8 1, i1 0)
+ ret i32 %val
+}
diff --git a/llvm/test/CodeGen/NVPTX/convert-sm90.ll b/llvm/test/CodeGen/NVPTX/convert-sm90.ll
new file mode 100644
index 00000000000000..8f932005830250
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/convert-sm90.ll
@@ -0,0 +1,65 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78| FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx78| %ptxas-verify -arch=sm_90 %}
+
+declare i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8, i8, i1)
+
+define i32 @cvt_rn_tf32_f32(float %f1) {
+; CHECK-LABEL: cvt_rn_tf32_f32(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .f32 %f<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f32 %f1, [cvt_rn_tf32_f32_param_0];
+; CHECK-NEXT: cvt.rn.tf32.f32 %r1, %f1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %val = call i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 2, i8 0, i1 0)
+ ret i32 %val
+}
+
+define i32 @cvt_rn_relu_tf32_f32(float %f1) {
+; CHECK-LABEL: cvt_rn_relu_tf32_f32(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .f32 %f<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f32 %f1, [cvt_rn_relu_tf32_f32_param_0];
+; CHECK-NEXT: cvt.rn.relu.tf32.f32 %r1, %f1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %val = call i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 2, i8 0, i1 1)
+ ret i32 %val
+}
+
+define i32 @cvt_rz_tf32_f32(float %f1) {
+; CHECK-LABEL: cvt_rz_tf32_f32(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .f32 %f<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f32 %f1, [cvt_rz_tf32_f32_param_0];
+; CHECK-NEXT: cvt.rz.tf32.f32 %r1, %f1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %val = call i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 1, i8 0, i1 0)
+ ret i32 %val
+}
+
+define i32 @cvt_rz_relu_tf32_f32(float %f1) {
+; CHECK-LABEL: cvt_rz_relu_tf32_f32(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .f32 %f<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f32 %f1, [cvt_rz_relu_tf32_f32_param_0];
+; CHECK-NEXT: cvt.rz.relu.tf32.f32 %r1, %f1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %val = call i32 @llvm.nvvm.cvt.float.to.tf32(float %f1, i8 1, i8 0, i1 1)
+ ret i32 %val
+}
|
@Artem-B , could you please help with a review? |
cc: @LewisCrawford |
b435f83
to
c75c744
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! baring some minor nits this looks good to me. The approaches we adopt here, for representing rounding modes etc., will probably be used in many other future intrinsic so lets make sure we set good standards now. Curious for the thoughts of @Artem-B and @andykaylor on this.
fcbecc6
to
00059f6
Compare
00059f6
to
bf34c2e
Compare
bf34c2e
to
def2900
Compare
This patch adds the missing variants of float to tf32 conversion intrinsics. Lit tests are added for all the intrinsics. PTX Spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <[email protected]>
def2900
to
58b66bc
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/4561 Here is the relevant piece of the build log for the reference
|
This patch adds the missing variants of float to tf32 conversion intrinsics, with their corresponding lit tests. PTX Spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <[email protected]>
PR llvm#121507 added 'cvt' intrinsics to convert float to tf32, with the valid set of rounding and saturation modes. This PR adds an NVVM Dialect Op for the same. * lit tests are added to verify the lowering to intrinsics. * Negative tests are also added to check the error-handling of invalid combinations. Signed-off-by: Durgadoss R <[email protected]>
PR #121507 added 'cvt' intrinsics to convert float to tf32, with the valid set of rounding and saturation modes. This PR adds an NVVM Dialect Op for the same. * lit tests are added to verify the lowering to intrinsics. * Negative tests are also added to check the error-handling of invalid combinations. Signed-off-by: Durgadoss R <[email protected]>
This patch adds the set of f32->tf32 cvt intrinsics introduced in sm100 with ptx8.6. This builds on top of the recent PR llvm#121507. Tests are verified with a 12.8 ptxas executable. Also update the lit.cfg.py to include the latest PTXAS_EXE versions. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <[email protected]>
This patch adds the set of f32->tf32 cvt intrinsics introduced in sm100 with ptx8.6. This builds on top of the recent PR llvm#121507. Tests are verified with a 12.8 ptxas executable. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <[email protected]>
This patch adds the set of f32 -> tf32 cvt intrinsics introduced in sm100 with ptx8.6. This builds on top of the recent PR #121507. Tests are verified with a 12.8 ptxas executable. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <[email protected]>
This patch adds the missing variants of float to tf32 conversion
intrinsics, with their corresponding lit tests.
PTX Spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt