Skip to content

[flang] Use saturated intrinsic for floating point conversions #130686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions flang/lib/Optimizer/CodeGen/CodeGen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -835,10 +835,20 @@ struct ConvertOpConversion : public fir::FIROpConversion<fir::ConvertOp> {
return mlir::success();
}
if (mlir::isa<mlir::IntegerType>(toTy)) {
if (toTy.isUnsignedInteger())
rewriter.replaceOpWithNewOp<mlir::LLVM::FPToUIOp>(convert, toTy, op0);
else
rewriter.replaceOpWithNewOp<mlir::LLVM::FPToSIOp>(convert, toTy, op0);
// NOTE: We are checking the fir type here because toTy is an LLVM type
// which is signless, and we need to use the intrinsic that matches the
// sign of the output in fir.
if (toFirTy.isUnsignedInteger()) {
auto intrinsicName =
mlir::StringAttr::get(convert.getContext(), "llvm.fptoui.sat");
rewriter.replaceOpWithNewOp<mlir::LLVM::CallIntrinsicOp>(
convert, toTy, intrinsicName, op0);
} else {
auto intrinsicName =
mlir::StringAttr::get(convert.getContext(), "llvm.fptosi.sat");
rewriter.replaceOpWithNewOp<mlir::LLVM::CallIntrinsicOp>(
convert, toTy, intrinsicName, op0);
}
Comment on lines +841 to +851
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you know what these saturation intrinsics get lowered to? And is there a big difference in performance? Would it be possible to use the saturation intrinsic only when necessary? Or can that not be determined at compile time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They produce more instructions on x86 (when they cannot be const-folded away) (x86 godbolt link, more instructions, aarch64 godbolt link, both using fcvtzs), and if someone converted reals to integers in a hot loop they might see worse performance, however I was unable to find a difference in the performance tests that I ran. I'll be watching performance numbers after this is merged in case something comes up.

Would it be possible to use the saturation intrinsic only when necessary?

As long as we want the correct semantics for values only known at runtime, I don't think so. However, especially if performance issues come up, I think it would make sense to use the fptosi/fptoui instructions under some flag, maybe enabled by default above some optimization level. Do you think using the instructions instead of the saturated intrinsics under (for example) -ffast-math would be a good compromise if performance issues show up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think using the instructions instead of the saturated intrinsics under (for example) -ffast-math would be a good compromise if performance issues show up?

Personally, I agree with that approach. I think it is better to avoid having too many code generation paths unless there is an actual use case for it, in which case -ffast-math would sounds like the right flag to deviate from the requirements.

Please wait for @kiranchandramohan's feedback on the matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ashermancinelli for the reply. Just a few points, thinking out loud.

Would it be possible to use the saturation intrinsic only when necessary? Or can that not be determined at compile time?

This question I was asking here was about inferring from the real and integer types involved in the conversion. Like if we are converting from real(kind=2)/half-precision to integer(kind=4) then probably integer(kind=4) can hold all values without saturation.

gfortran (without fast-math) seems to be calling __fixtfsi.

There is also a question of whether vectorisation will work for these saturation intrinsics. I can see one issue filed against this topic by the rust community.
#59682

Do you think using the instructions instead of the saturated intrinsics under (for example) -ffast-math would be a good compromise if performance issues show up?

Personally, I agree with that approach. I think it is better to avoid having too many code generation paths unless there is an actual use case for it, in which case -ffast-math would sounds like the right flag to deviate from the requirements.

Please wait for @kiranchandramohan's feedback on the matter.

Makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like if we are converting from real(kind=2)/half-precision to integer(kind=4) then probably integer(kind=4) can hold all values without saturation.

I see what you mean now and that seems like a great idea. I see you've approved this PR so I'll merge for now, but I would like to address this in a follow-up. Thanks!

return mlir::success();
}
} else if (mlir::isa<mlir::IntegerType>(fromTy)) {
Expand Down
18 changes: 13 additions & 5 deletions flang/test/Fir/convert-to-llvm.fir
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,10 @@ func.func @convert_from_float(%arg0 : f32) {
%7 = fir.convert %arg0 : (f32) -> i16
%8 = fir.convert %arg0 : (f32) -> i32
%9 = fir.convert %arg0 : (f32) -> i64
%10 = fir.convert %arg0 : (f32) -> ui8
%11 = fir.convert %arg0 : (f32) -> ui16
%12 = fir.convert %arg0 : (f32) -> ui32
%13 = fir.convert %arg0 : (f32) -> ui64
return
}

Expand All @@ -711,11 +715,15 @@ func.func @convert_from_float(%arg0 : f32) {
// CHECK: %{{.*}} = llvm.fpext %[[ARG0]] : f32 to f64
// CHECK: %{{.*}} = llvm.fpext %[[ARG0]] : f32 to f80
// CHECK: %{{.*}} = llvm.fpext %[[ARG0]] : f32 to f128
// CHECK: %{{.*}} = llvm.fptosi %[[ARG0]] : f32 to i1
// CHECK: %{{.*}} = llvm.fptosi %[[ARG0]] : f32 to i8
// CHECK: %{{.*}} = llvm.fptosi %[[ARG0]] : f32 to i16
// CHECK: %{{.*}} = llvm.fptosi %[[ARG0]] : f32 to i32
// CHECK: %{{.*}} = llvm.fptosi %[[ARG0]] : f32 to i64
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptosi.sat"(%[[ARG0]]) : (f32) -> i1
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptosi.sat"(%[[ARG0]]) : (f32) -> i8
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptosi.sat"(%[[ARG0]]) : (f32) -> i16
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptosi.sat"(%[[ARG0]]) : (f32) -> i32
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptosi.sat"(%[[ARG0]]) : (f32) -> i64
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptoui.sat"(%[[ARG0]]) : (f32) -> i8
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptoui.sat"(%[[ARG0]]) : (f32) -> i16
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptoui.sat"(%[[ARG0]]) : (f32) -> i32
// CHECK: %{{.*}} = llvm.call_intrinsic "llvm.fptoui.sat"(%[[ARG0]]) : (f32) -> i64

// -----

Expand Down