-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang] Use saturated intrinsic for floating point conversions #130686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ashermancinelli
merged 3 commits into
llvm:main
from
ashermancinelli:ajm/fp-conversions
Mar 12, 2025
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you know what these saturation intrinsics get lowered to? And is there a big difference in performance? Would it be possible to use the saturation intrinsic only when necessary? Or can that not be determined at compile time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They produce more instructions on x86 (when they cannot be const-folded away) (x86 godbolt link, more instructions, aarch64 godbolt link, both using
fcvtzs
), and if someone converted reals to integers in a hot loop they might see worse performance, however I was unable to find a difference in the performance tests that I ran. I'll be watching performance numbers after this is merged in case something comes up.As long as we want the correct semantics for values only known at runtime, I don't think so. However, especially if performance issues come up, I think it would make sense to use the fptosi/fptoui instructions under some flag, maybe enabled by default above some optimization level. Do you think using the instructions instead of the saturated intrinsics under (for example)
-ffast-math
would be a good compromise if performance issues show up?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I agree with that approach. I think it is better to avoid having too many code generation paths unless there is an actual use case for it, in which case
-ffast-math
would sounds like the right flag to deviate from the requirements.Please wait for @kiranchandramohan's feedback on the matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ashermancinelli for the reply. Just a few points, thinking out loud.
This question I was asking here was about inferring from the real and integer types involved in the conversion. Like if we are converting from real(kind=2)/half-precision to integer(kind=4) then probably integer(kind=4) can hold all values without saturation.
gfortran (without fast-math) seems to be calling
__fixtfsi
.There is also a question of whether vectorisation will work for these saturation intrinsics. I can see one issue filed against this topic by the rust community.
#59682
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now and that seems like a great idea. I see you've approved this PR so I'll merge for now, but I would like to address this in a follow-up. Thanks!