[SYCL][CUDA][libclc] Add asynchronous barrier #5303

t4c1 · 2022-01-13T14:18:40Z

Adds extension proposal and implementation for asynchronous barrier (for now the implementation is for CUDA backend sm 80+ only).

Tests for this are here: intel/llvm-test-suite#737

Pennycook

This looks like a useful extension. Most of my comments are about making it more obvious that it's CUDA-specific to help set user expectations.

Split barriers are useful, and we (Intel) are interested in exposing them for other hardware. Cm already exposes some support for split barriers on Intel GPUs via its cm_sbarrier function; we should think about what a portable split barrier interface would look like.

libclc/ptx-nvidiacl/libspirv/synchronization/aw_barrier.cl

sycl/doc/extensions/Barrier/Barrier.asciidoc

t4c1 · 2022-01-20T08:51:54Z

Cm already exposes some support for split barriers on Intel GPUs via its cm_sbarrier function;

I did not find anything about that. Can you point me to either extension doc or the implementation?

we should think about what a portable split barrier interface would look like.

Currently this extension exposes all functionality from CUDA. However I am not sure if we need to require all this functionality from all implementations. For example I would not count no_complete variants of the arrive operation or copy_async_arrive as the most essential parts of the barrier.

Getting some input from whoever designed cm_sbarrier would also be nice.

…dependent forward progress

Pennycook · 2022-01-20T15:44:02Z

I did not find anything about that. Can you point me to either extension doc or the implementation?

I can't find a good resource, either. This isn't a SYCL extension, but you might be able to glean some details from the CM compiler: https://github.com/intel/cm-compiler/search?q=sbarrier

Currently this extension exposes all functionality from CUDA. However I am not sure if we need to require all this functionality from all implementations. For example I would not count no_complete variants of the arrive operation or copy_async_arrive as the most essential parts of the barrier.

Personally, I think it would be cleaner to have two extensions to support this case: one CUDA-backend-specific extension that gives access to capabilities only supported by CUDA (but that may be useful to those writing SYCL and tuning some sections of their code for NVIDIA devices); and another, more general extension that exposes split barriers in a portable way.

Providing a class that works everywhere but has member functions that only work on some devices has the potential to be confusing.

sycl/doc/extensions/Barrier/Barrier.asciidoc

smaslov-intel · 2022-01-20T20:09:25Z

sycl/doc/extensions/Barrier/Barrier.asciidoc

+namespace ext {
+namespace oneapi {
+
+class barrier {


does this really need to have that "reach" interface compared to std::barrier that it draws similarity with?

I am not sure what do you mean by "reach interface".

I mean that std::barrier is happy with just arrive/wait/drop, and no token. Can we simplify this API (it's likely will add chances that other backends would be able to support it too)?

While removing this token would make interface more similar to std::barrier, I disagree that doing so would make it easier to implement for other backends. If other backends do not need this token they can pass around a dummy value. Meanwhile implementing this without the token for CUDA would lead to more complicated implementation and most likely additional limitations, such as only one barrier being usable at once.

If other backends do not need this token they can pass around a dummy value.

But the token is used as an input in the core wait API, it can't be dummy.

Meanwhile implementing this without the token for CUDA would lead to more complicated implementation and most likely additional limitations, such as only one barrier being usable at once.

CUDA implementation can use "token" under the hood, of course.

I don't have any practical suggestions, just expressing a desire for a simpler and a more standard interface for the feature. I will rely on @Pennycook review/approval for the feature definition and then review the implementation.

But the token is used as an input in the core wait API, it can't be dummy.

I guess I was not clear enough. I meant that an implementation that does need the token can return a dummy value from arrive and ignore that token in the wait call.

CUDA implementation can use "token" under the hood, of course.

I have a feeling that hiding the token under the hood would, depending on how it is implemented, either limit the functionality, or be inefficient or complicated. Although I can't provide any concrete arguments from the top of my head. I need to think about this some more.

…sues.

sycl/doc/extensions/Barrier/Barrier.asciidoc

sycl/include/CL/__spirv/spirv_ops.hpp

sycl/include/sycl/ext/oneapi/barrier.hpp

Co-authored-by: John Pennycook <[email protected]>

sycl/doc/extensions/Barrier/Barrier.asciidoc

Pennycook · 2022-04-08T18:21:56Z

I don't quite agree to that, but OK. should the name of the extension be "ext_cuda_async_barrier" instead of "ext_oneapi_cuda_async_barrier" then?

The SYCL 2020 naming conventions for extensions encourages "sycl_ext_<vendorstring>_<featurename>" , where the vendor string is used to avoid collisions. Our vendor strings are "oneapi" and "intel".

"ext_oneapi_cuda_" is supposed to convey that it's a oneAPI extension, but specific to the CUDA backend.

t4c1 · 2022-05-03T08:22:46Z

I guess now this has all the approvals needed for merge?

steffenlarsen · 2022-05-04T09:30:04Z

I guess now this has all the approvals needed for merge?

I believe we need @pvchupin's approval as well.

pvchupin · 2022-05-04T20:17:37Z

@gmlueck, can you review/approve please?

gmlueck

@gmlueck, can you review/approve please?

@Pennycook has been taking the lead on reviewing the spec for this, and I trust his judgement. I happened to notice a small mistake when I read through, though. I'm happy when that is fixed.

sycl/doc/extensions/experimental/sycl_ext_oneapi_cuda_async_barrier.asciidoc

t4c1 · 2022-05-17T10:31:07Z

@pvchupin @smaslov-intel @Pennycook The last change in this PR was minimal and this was previously approved, so can we get this approved and merged?

Adds tests for intel/llvm#5303

pvchupin · 2022-05-17T21:50:00Z

@t4c1, please fix post commit issues: https://github.com/intel/llvm/runs/6478611698

Fixes warnings introduced in #5303. Co-authored-by: Artur Gainullin <[email protected]>

pvchupin · 2022-05-19T02:12:35Z

@t4c1, please check also LIT fail on windows: https://github.com/intel/llvm/runs/6478612314?check_suite_focus=true
Also fails in post-commit of this PR.

Fixed async barrier definition clash with max macro defined in some headers on windows by temporarily undefining the macro. The issue was reported here: #5303 (comment)

Adds extension proposal and implementation for asynchronous barrier (for now the implementation is for CUDA backend sm 80+ only). Tests for this are here: intel/llvm-test-suite#737

Fixes warnings introduced in intel#5303. Co-authored-by: Artur Gainullin <[email protected]>

Fixed async barrier definition clash with max macro defined in some headers on windows by temporarily undefining the macro. The issue was reported here: intel#5303 (comment)

…e#737) Adds tests for intel#5303

FMarno and others added 9 commits December 9, 2021 18:20

WIP: started trying to impl with atomics

8f5e03e

Merge branch 'sycl' into finlay/async_barrier_proposal

4eed959

proposal and untested async barrier implementation

4721cf6

added advanced functionality (still untested)

38f1d76

fixed max

d34a206

clarified the cycle of arrivals and waits

41c66ec

bugfixes

bc4f04a

removed pending_count, which is deprecated in CUDA

8dff0f7

format

f68d80b

t4c1 requested review from bader and a team as code owners January 13, 2022 14:18

t4c1 requested a review from smaslov-intel January 13, 2022 14:18

t4c1 changed the title ~~Async barrier~~ [SYCL][CUDA][libclc] Add asynchronous barrier Jan 13, 2022

t4c1 mentioned this pull request Jan 13, 2022

[SYCL][CUDA] Add tests for asynchronous barrier intel/llvm-test-suite#737

Merged

Pennycook requested changes Jan 19, 2022

View reviewed changes

t4c1 added 2 commits January 20, 2022 09:17

addressed first review comments and clarified how it works without in…

62ada41

…dependent forward progress

format

c662d70

smaslov-intel reviewed Jan 20, 2022

View reviewed changes

sycl/doc/extensions/Barrier/Barrier.asciidoc Outdated Show resolved Hide resolved

smaslov-intel reviewed Jan 20, 2022

View reviewed changes

t4c1 added 2 commits January 25, 2022 08:48

clarified that the extension is only for CUDA and fixed some minor is…

1e2d99b

…sues.

change the name of libclc functions to __clc

a804562

bader requested a review from Pennycook January 27, 2022 11:28

bader mentioned this pull request Feb 3, 2022

[LIBCLC][HIP] Add HIP AMD support for ilogb, log2, remainder #5272

Merged

Pennycook requested changes Feb 4, 2022

View reviewed changes

t4c1 and others added 2 commits February 9, 2022 08:41

Apply suggestions from code review

9f2f636

Co-authored-by: John Pennycook <[email protected]>

addressed review comments

f33ddf0

Pennycook reviewed Feb 9, 2022

View reviewed changes

sycl/doc/extensions/Barrier/Barrier.asciidoc Outdated Show resolved Hide resolved

smaslov-intel previously approved these changes Apr 8, 2022

View reviewed changes

steffenlarsen requested a review from pvchupin May 4, 2022 09:30

pvchupin previously approved these changes May 4, 2022

View reviewed changes

gmlueck reviewed May 9, 2022

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_oneapi_cuda_async_barrier.asciidoc Outdated Show resolved Hide resolved

fixed namespace in examples

9478857

t4c1 dismissed stale reviews from pvchupin, smaslov-intel, and Pennycook via 9478857 May 10, 2022 06:59

gmlueck approved these changes May 10, 2022

View reviewed changes

smaslov-intel approved these changes May 17, 2022

View reviewed changes

pvchupin approved these changes May 17, 2022

View reviewed changes

pvchupin merged commit 6770421 into intel:sycl May 17, 2022

pvchupin pushed a commit to intel/llvm-test-suite that referenced this pull request May 17, 2022

[SYCL][CUDA] Add tests for asynchronous barrier (#737)

533dc7a

Adds tests for intel/llvm#5303

t4c1 mentioned this pull request May 18, 2022

[SYCL][CUDA] fix warnings in async barrier #6165

Merged

pvchupin pushed a commit that referenced this pull request May 19, 2022

[SYCL][CUDA] fix warnings in async barrier (#6165)

8ae7851

Fixes warnings introduced in #5303. Co-authored-by: Artur Gainullin <[email protected]>

t4c1 mentioned this pull request May 19, 2022

[SYCL][CUDA][Windows] Fix async barrier definition clash with predefined macro #6172

Merged

yinyangsx pushed a commit to yinyangsx/llvm that referenced this pull request May 25, 2022

[SYCL][CUDA] fix warnings in async barrier (intel#6165)

e17c6f5

Fixes warnings introduced in intel#5303. Co-authored-by: Artur Gainullin <[email protected]>

t4c1 deleted the async_barrier branch May 27, 2022 06:52

aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023

[SYCL][CUDA] Add tests for asynchronous barrier (intel/llvm-test-suit…

e27410a

…e#737) Adds tests for intel#5303

[SYCL][CUDA][libclc] Add asynchronous barrier #5303

[SYCL][CUDA][libclc] Add asynchronous barrier #5303

Uh oh!

Conversation

t4c1 commented Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pennycook left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

t4c1 commented Jan 20, 2022

Uh oh!

Pennycook commented Jan 20, 2022

Uh oh!

Uh oh!

smaslov-intel Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

t4c1 Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

smaslov-intel Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

t4c1 Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

smaslov-intel Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

t4c1 Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pennycook commented Apr 8, 2022

Uh oh!

t4c1 commented May 3, 2022

Uh oh!

steffenlarsen commented May 4, 2022

Uh oh!

pvchupin commented May 4, 2022

Uh oh!

gmlueck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

t4c1 commented May 17, 2022

Uh oh!

pvchupin commented May 17, 2022

Uh oh!

pvchupin commented May 19, 2022

Uh oh!

Uh oh!

t4c1 commented Jan 13, 2022 •

edited

Loading