Skip to content

[SYCL][libclc][NATIVECPU] Implement generic atomic load for generic target #13249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 10, 2024

Conversation

PietroGhg
Copy link
Contributor

This PR implements the overload for the generic address space for __spirv_AtomicLoad in the generic target.

Libclc implements overloads for the generic address space for __spirv_AtomicLoad (and several other builtins) in the ptx and amdgcn targets, but doesn't do so for generic.

I've created this PR to gather some initial feedback on the implementation, I'd like to add implementations for other builtins with follow up PRs.

@PietroGhg PietroGhg requested a review from a team as a code owner April 2, 2024 09:07
@PietroGhg PietroGhg requested a review from hdelan April 2, 2024 09:07
@hdelan
Copy link
Contributor

hdelan commented Apr 2, 2024

Is it safe to have explicit mangling for libclc/generic with AS ptr params? As far as I am aware, the mangling of generic AS pointers can depend on the target that it is being compiled for. This patch does not add builtins for the generic AS, so maybe this is not a concern. My main question is: do we have guarantees that the mangling of global and local AS ptrs will always be the same, regardless of the target being compiled for? I think the answer is yes, but would like to confirm.

Tagging @frasercrmck for context

@PietroGhg
Copy link
Contributor Author

Is it safe to have explicit mangling for libclc/generic with AS ptr params? As far as I am aware, the mangling of generic AS pointers can depend on the target that it is being compiled for. This patch does not add builtins for the generic AS, so maybe this is not a concern. My main question is: do we have guarantees that the mangling of global and local AS ptrs will always be the same, regardless of the target being compiled for? I think the answer is yes, but would like to confirm.

Tagging @frasercrmck for context

Thanks @hdelan, you raise a good point. My understanding is that in general mangling is target dependent, and in particular when it comes to address spaces, the mangling depends in the Address Space Map used by the compiler front end, and this applies to all address spaces, not just the generic one. So yeah here I think we are assuming that the generic AS is mapped to 0 when mangling, which is probably not too different from other assumptions that are already made in libclc (global->1, local->3) ?

@frasercrmck
Copy link
Contributor

Is it safe to have explicit mangling for libclc/generic with AS ptr params? As far as I am aware, the mangling of generic AS pointers can depend on the target that it is being compiled for. This patch does not add builtins for the generic AS, so maybe this is not a concern. My main question is: do we have guarantees that the mangling of global and local AS ptrs will always be the same, regardless of the target being compiled for? I think the answer is yes, but would like to confirm.
Tagging @frasercrmck for context

Thanks @hdelan, you raise a good point. My understanding is that in general mangling is target dependent, and in particular when it comes to address spaces, the mangling depends in the Address Space Map used by the compiler front end, and this applies to all address spaces, not just the generic one. So yeah here I think we are assuming that the generic AS is mapped to 0 when mangling, which is probably not too different from other assumptions that are already made in libclc (global->1, local->3) ?

Yeah, it's a bit tricky.

All manglings are ultimately up to the target, and may depend on other options (see AMDGPU changing the address space mapping depending on whether -x cl is passed). It's quite lucky that we happen to have global and local (and constant so stable) but generic doesn't adhere to this. It's commonly 0, but may be 5 on AMDGPU. Also, the target address space isn't the whole story, because address space 0 isn't universally mangled as P - it could be PU3AS0, if the default address space isn't zero (again, see AMDGPU).

We also have the problem in libclc where manglings need to be stable across different targets (e.g., the remangling process) such as ensuring address space manglings line up between "host" and "device" code.

I don't have a solid recommendation, but the generic address space can be mangled in three different ways on AMDGPU (P, PU3AS0, PU3AS5), and we currently make use of two of those (P, PU3AS0), and have a murky (buggy?) remangling process which happens to fudge those into just the one (PU3AS0 -> P, which is what the host expects).

My gut feeling is that we shouldn't have any hard-coded address space manglings in the generic code. But that's too late as we already have those! So maybe this is okay, or we can hack it with CMake (which is just replicating clang knowledge in a hacky way, so I'm not happy about it either).


IMPL_AS(int, i, , 4)
IMPL_AS(unsigned int, j, u, 4)

IMPL(unsigned int, j, global, , u, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be global? I thought the PR was for generic AS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting it, I copy-pasted the line and didn't notice it. I changed to a blank in the call to the macro since OpenCL 3.0 doesn't allot using generic as an AS qualifier, I've also added implementations of the clc* load helpers that use addrspace(0).

Copy link
Contributor

@frasercrmck frasercrmck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits. As discussed I'm not a fan of manual mangling (especially in the generic builtins) but we are already in that situation, so this PR isn't making anything worse. We'll investigate ways of improving this, I think.

@@ -18,7 +18,7 @@ TYPE __clc__atomic_##PREFIX##load_##AS##_##BYTE_SIZE##_##MEM_ORDER(volatile AS c
FDECL(TYPE, PREFIX, AS, BYTE_SIZE, acquire) \
FDECL(TYPE, PREFIX, AS, BYTE_SIZE, seq_cst) \
_CLC_DEF TYPE \
_Z18__spirv_AtomicLoadPU3##AS_MANGLED##K##TYPE_MANGLED##N5__spv5Scope4FlagENS1_19MemorySemanticsMask4FlagE( \
_Z18__spirv_AtomicLoadP##AS_MANGLED##K##TYPE_MANGLED##N5__spv5Scope4FlagENS1_19MemorySemanticsMask4FlagE( \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks like it could be improved - that \ should be aligned with the others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've aligned the \ at the end of the lines, thank you

@@ -31,8 +31,9 @@ TYPE __clc__atomic_##PREFIX##load_##AS##_##BYTE_SIZE##_##MEM_ORDER(volatile AS c
}

#define IMPL_AS(TYPE, TYPE_MANGLED, PREFIX, BYTE_SIZE) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to format this whole expression while you're here? It'd be nice to slowly improve the formatting of libclc as we go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@PietroGhg
Copy link
Contributor Author

@intel/llvm-gatekeepers this looks ready to be merged, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants