Skip to content

[NVPTX] Add support for atomic add for f16 type #84295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 12, 2024

Conversation

akuegel
Copy link
Member

@akuegel akuegel commented Mar 7, 2024

atom.add.noftz.f16 is supported since SM 7.0

@akuegel akuegel requested a review from Artem-B March 7, 2024 10:02
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, modulo missing constrain on PTX version.

@@ -0,0 +1,28 @@
; RUN: llc < %s -march=nvptx -mcpu=sm_70 -mattr=+ptx63 | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test it might be convenient to autogenerate the checks with llvm/utils/update_test_checks.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering of the ld.param.*s relative to the atom.* instructions isn't relevant to this test, correct? If so, we may not want to include those CHECK-NEXT:s in the test.

@akuegel akuegel merged commit 8e0f4b9 into llvm:main Mar 12, 2024
@akuegel akuegel deleted the atomic_add_f16 branch March 12, 2024 08:18
dklimkin added a commit that referenced this pull request Mar 12, 2024
dklimkin added a commit that referenced this pull request Mar 12, 2024
@akuegel
Copy link
Member Author

akuegel commented Mar 13, 2024

By now I have created a reproducer for what caused the revert. If I adjust the test slightly, and change the line with %r1:

%r1 = atomicrmw fadd ptr %dp0, half 1.0 seq_cst, align 2

The codegen for this becomes:

ld.param.u64 %rd1, [test_param_0]; 
atom.add.noftz.f16 %rs1, [%rd1], 0x3C00;

And this fails verification with this error:

Arguments mismatch for instruction 'atom'

I lack the knowledge about PTX to know what is wrong with that. Will try to figure it out.

@akuegel
Copy link
Member Author

akuegel commented Mar 13, 2024

Ok, seems that ptx makes a difference between floating point constants and integer constants. And we are generating a integer constant, I guess due to using Int16Register. @Artem-B in case you can give me a pointer where I can make sure that we are using a floating point constant here, that would be great :)

@akuegel
Copy link
Member Author

akuegel commented Mar 13, 2024

Ok, I found something in NVPTXISelDagToDag.cpp, seems like there is no hex representation for f16 constants, and it is replaced by loading from a f16 register. Is this the right place where to look further?

@Artem-B
Copy link
Member

Artem-B commented Mar 13, 2024

Huh. It appears that the instruction does not accept immetiate arguments for f16 variants, though it does accept them for f32. This looks like a bug in ptxas.
https://godbolt.org/z/9Gv1McM4M

Normally f16 instruction variants accept plain hex immediate values.

Looks like we'll need to disable insttruction variant with an immediate argument and force passing it via a register.

@akuegel
Copy link
Member Author

akuegel commented Mar 14, 2024

Tried to disable the instruction variant with an immediate argument, and that seems to work:

#85197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants