[SYCL] Implement per-aspect device code split #7302

AlexeySachkov · 2022-11-07T15:45:30Z

This is a piece of implementation of optional kernels features from SYCL 2020 spec, in accordance with our design doc.

Note: this PR doesn't fully implement the feature, but rather lays a foundation for subsequent incremental updates.
The main TODO items for future PRs:

refactor handling of largeGRF to use new splitter
add handling for other kernel properties like reqd_sub_group_size, reqd_work_group_size
add handling of global variables of optional types

AlexeySachkov

This is an initial version to get early feedback, there are still some bits missing:

no tests for ESIMD + aspects
no tests for doubleGFR + aspects
aspects 1, 2 treated like they are not the same as aspects 2, 1, which is a bug
global variables are not split into separate modules if they are of optional types

Tagging @asudarsa, @sarnex, @kbobrovs - I could use your feedback folks to see if there are major comments/objections/suggestions about the approach in general

AlexeySachkov · 2022-11-07T15:50:01Z

llvm/tools/sycl-post-link/ModuleSplitter.cpp

+      return Ret;
+    }
+
+    SmallSet<int, 4> Aspects;


It looks like this container doesn't maintain a stable ordering of items, which is a problem: there is a test in this PR, which adds aspects 1 and 2 in different order and they are treated as two distinct groups of aspects, which is not correct

Maybe SmallSetVector works?

Actually, my thumbs up is premature.
(Small)SetVector iterator guarantees iteration in the insertion order. That's not what we need. While operator == will work with any insertion and iteration order, hash needs sorted order, not insertion order - meeting 1 first then 2 will give the same property set as 2 then 1. So the hash should not depend on the insertion order.

std::set will work for this.

I thought about this and since we don't modify this container here like ever (we only construct it once), it should be enough to simply have a vector + sort it in constructor. Presumably, amount of aspects won't be any huge to cause any significant slowdowns.

When processing kernel A we may encounter property 1 first then property 2, so they will be recorded as {1, 2} in the Aspects for this kernel, and for kernel B we may encounter properties in different order and record as { 2, 1 }.
So this function

unsigned hash() const { llvm::hash_code AspectsHash = llvm::hash_combine_range(Aspects.begin(), Aspects.end()); return static_cast<unsigned>(llvm::hash_combine(AspectsHash)); }

will return different result for A and B. Am I missing something?

will return different result for A and B

Yeah, that is the problem which existing SmallSet. My idea is to use SmallVector and have it sorted in KernelProperties constructor. This way, hash will be the same regardless of order in which aspects were discovered.

I've implemented that approach in bd2ae95

AlexeySachkov · 2022-11-07T15:51:11Z

llvm/tools/sycl-post-link/sycl-post-link.cpp

+  const bool SplitByScope = ScopedSplitter->totalSplits() > 1;
+  Modified |= SplitByScope;
+  Modified |= SplitByProperties;
+
  // TODO this nested splitting scheme will not scale well when other split


Note: I do not plan to resolve this TODO in this PR. My current idea is to move doublegfr splitter to the new properties splitter and thus reduce the amount of split "dimensions" we have.

No flags to that idea, eventually we want the large grf stuff to be specified using kernel properties, so having them be treated the same here makes sense

sarnex · 2022-11-07T18:09:34Z

llvm/tools/sycl-post-link/ModuleSplitter.cpp

+    KernelProperties() = default;
+
+    KernelProperties(const Function *F) {
+      if (const MDNode *MDN = F->getMetadata("sycl_used_aspects")) {


I thought @steffenlarsen made some changes to sycl-post-link splitting for his work, but I don't see it in this history for this file. Is this intended to implement the stuff he's doing too in the future besides sycl_used_aspects?

What @steffenlarsen implemented is support for sycl_ext_oneapi_kernel_properties extension. At the moment, it simply provides an alternative interface for 4 core SYCL attributes (reqd_sub_group_size, reqd_work_group_size, work_group_size_hint?, device_has). Under the hood, we use the same LLVM IR representation for both attributes and properties (but the alignment might happen at different stages/in different places for different attributes/property pairs).

So, the idea is that this code will handle both in assumption that unification has happened prior it. When introducing new properties or attributes, I suppose that each will be designed case-by-case, but we should aim to have a similar handling flow for all of them.

For example, this new splitter I'm adding is also supposed to split modules by their reqd_sub_group_size, for example, but I will add that in a separate PR not to overcomplicate this one and have some baseline/infrastructure ready first

Sounds good, thanks

sarnex · 2022-11-07T18:11:13Z

No flags, I made some comments inline

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-1.ll

kbobrovs · 2022-11-08T00:28:29Z

llvm/tools/sycl-post-link/ModuleSplitter.cpp

@@ -761,5 +761,138 @@ getDoubleGRFSplitter(ModuleDesc &&MD, bool EmitOnlyKernelsAsEntryPoints) {
    return std::make_unique<ModuleCopier>(std::move(MD), std::move(Groups));
 }

+namespace {
+  struct KernelProperties {


nit: this struct works with aspects, so maybe the name could reflect that rather than 'properties'

The idea is to expand it further to also handle things like reqd_sub_group_size, large_gfr, etc. So, the intent of is to be some abstract properties. Perhaps we can come up with a different name like "device requirements" or "optional features"

"Features" seem nice

Renamed to UsedOptionalFeatures in 18ba12c

llvm/tools/sycl-post-link/ModuleSplitter.cpp

…aspect-device-code-split

llvm/tools/sycl-post-link/ModuleSplitter.cpp

…aspect-device-code-split

AlexeySachkov · 2022-11-17T14:15:16Z

/summary:run

steffenlarsen · 2022-11-17T15:41:28Z

llvm/tools/sycl-post-link/ModuleSplitter.cpp

+struct UsedOptionalFeatures {
+  SmallVector<int, 4> Aspects;
+  // TODO: extend this further with reqd-sub-group-size, reqd-work-group-size,
+  // double-grf and other properties


I don't think we need to worry about reqd-sub-group-size and reqd-work-group-size since they are required features through their Clang attributes - and by association the corresponding properties - and they are applied to entry points.

They are applied, but if we have two kernels with different required sub-group sizes, they must be split into a separate modules or otherwise JIT compilation might fail, because not all features are required to be supported by all devices for all sub-group sizes. The same is applicable for required work-group size, because different devices might have different supported work-group sizes.

llvm/tools/sycl-post-link/ModuleSplitter.cpp

kbobrovs

LGTM.
One final NIT - it might make sense to add XFAIL test with function pointers.

llvm/tools/sycl-post-link/ModuleSplitter.cpp

aelovikov-intel · 2022-11-17T17:48:22Z

llvm/tools/sycl-post-link/ModuleSplitter.cpp

+    if (!isEntryPoint(F, EmitOnlyKernelsAsEntryPoints) ||
+        !MD.isEntryPointCandidate(F)) {


This "||" is confusing for an uninformed reader (like me). I would have expected to see "&&" instead, based on the methods called.

If the call is correct, can you add clarifying comment to the sources, please?

Looking at isEntryPointCandidate description it seems to me that the description is correct, i.e. we might have an entry point in a module, which actually belongs to a different device image and from that point of view we need to use || here to avoid inclusion of that entry point to more than one module.

However, I don't quite understand how it is possible to get a module that contains entry points which belong to another module. That "entry point candidate" mechanism was introduced in 9a55da5 as a replacement for "allowed entries" introduced by 8c7bb45. Perhaps @kbobrovs can share some background here?

llvm/tools/sycl-post-link/ModuleSplitter.cpp

llvm/tools/sycl-post-link/sycl-post-link.cpp

AlexeySachkov · 2022-11-21T08:29:10Z

/summary:run

llvm/tools/sycl-post-link/ModuleSplitter.cpp

llvm/tools/sycl-post-link/sycl-post-link.cpp

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-3.ll

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-4.ll

sarnex · 2022-11-21T19:08:16Z

thanks for addressing comments, lgtm thanks

aelovikov-intel · 2022-11-21T19:14:49Z

All my stylistic comments have been addressed (except #7302 (comment) that would need more discussion, I guess), thanks!

I'm not familiar with the subject area so will let others give the final approval. No objections from my side.

pvchupin · 2022-11-21T20:27:46Z

@AlexeySachkov, please handle/explain fails.

pvchupin · 2022-11-21T20:28:07Z

/summary:run

That doesn't work today btw.

AlexeySachkov · 2022-11-21T21:41:30Z

@pvchupin, from what I see the only fail in all configurations is a typo in warning message about SYCL_DEVICE_FILTER, it was fixed in #7454.

intel/llvm#7302 brough per-aspects device code split into the compiler, which means that there is no need to specify explicit per-kernel split to avoid issues with fp64 and speculative compilation.

) #### Intro This is a refactoring of how we perform device code split in `sycl-post-link`, which is intended to solve several existing issues with the current implementation: 1. increased peak RAM consumption by `sycl-post-link` 2. bad scaling with more and more split "dimensions" being added 3. increased tests maintenance cost due to non-deterministic order (between commits) of output files produced by `sycl-post-link` #### A bit more context about the issues above: (1) Increase peak RAM consumption is caused by the fact that we currently preserve **all** splits in-memory, even though we can process them on-by-one and discard them as soon as we stored them to a disk. This was implemented as a memory consumption optimization in #5021, but it got accidentally reverted in #7302 as an attempt to workaround (2). (2) is pretty much summarized in our source code: https://github.com/intel/llvm/blob/afebb2543ccecb89f83c84b68fba7616bbab89ac/llvm/tools/sycl-post-link/sycl-post-link.cpp#L806-L811 (3) is caused by a bad implementation decision made in #7302: because every split is now identified by a hash, every time you add a new split "dimension"/new feature to an account, it results in different hashes for existing tests. Just look how many unrelated tests had to be updated in #7512, #8056 and #8167 #### Now to the PR itself: It introduces a new infrastructure for categorizing/grouping kernel functions: instead of using hashes, we now build a string description for each kernel function and then group kernels with the same description string together. String description is built by a new entity: it accepts a set of rules, where each rule is a simple function which returns a string for passed `llvm::Function`. Results of all rules are concatenated together and rules are invoked in a stable order of their registration. There is a simple API for building those rules. It provides some predefined rules for the most popular use cases like turning a function attribute or a metadata into a string descriptor for the function. There is also a possibility to pass a custom callback there to implement more complicated logic. #### How does this PR help with issues above? (1) and (2) are fixed in conjunction: `sycl-post-link` was refactored to avoid storing more than one split module at a time and that is possible because the PR unifies per-scope and optional-kernel-features splitters into a single generic splitter. The new API for kernels categorization seems to be flexible enough to provide that infrastructure so merged splitters still look OK code-wise. (3) is caused by using string identifiers instead of hashes as well as by using a data structure which sorts identifiers. #### Any other benefits from this PR? About 50 lines of code less to support :) Extending device code split for more optional features would be even easier than it is now: instead of adding several changes to various places around `UsedOptionalFeatures` structure, it will be just adding a 1-3 lines of code. Please also note that `UsedOptionalFeatures` contains tons of inconsistencies in its implementation, which will all gone with this PR: in `operator==` we don't use hash and instead compare certain fields directly (and we do miss some of them); `generateModuleName` method skips some of optional features and ignores them. Cross-module `device_global` usages checks should now work at all split dimensions (except for ESIMD). #### Any potential downsides? With current `UsedOptionalFeatures` there is a possibility to embed various information (used aspects, `large-grf` flag, etc.) directly during device code split to avoid re-gathering that information later when we generate properties. With the suggested approach, it would be harder to do, because it doesn't seem to naturally fit to the proposed infrastructure: see changes I did around `large-grf` in this PR. However, we have never actually implemented this and re-querying some metadata from function doesn't seem like a bottleneck, so it should really be a very minor and only theoretical downside.

AlexeySachkov added 9 commits November 3, 2022 10:09

WIP

e139505

Refactor sycl-post-link part

981c726

Refactor/generalize some code

03a4db5

Renaming; better names for split modules

f7de004

Move device code split tests into a separate directory

cc01839

Fix DenseMap helpers

f266933

Fix cross module device globals detection

eca5ca4

revert changes in device_requirments test

731f6f7

Add a couple of LIT tests

df73f90

AlexeySachkov commented Nov 7, 2022

View reviewed changes

sarnex reviewed Nov 7, 2022

View reviewed changes

kbobrovs reviewed Nov 8, 2022

View reviewed changes

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-1.ll Show resolved Hide resolved

kbobrovs reviewed Nov 8, 2022

View reviewed changes

llvm/tools/sycl-post-link/ModuleSplitter.cpp Outdated Show resolved Hide resolved

kbobrovs reviewed Nov 8, 2022

View reviewed changes

llvm/tools/sycl-post-link/ModuleSplitter.cpp Outdated Show resolved Hide resolved

AlexeySachkov added 5 commits November 11, 2022 09:40

KernelProperties -> UsedOptionalFeatures

18ba12c

Cache UsedOptionalFeatures hash

7d7fd45

Aspects order shouldn't matter

bd2ae95

Merge remote-tracking branch 'origin/sycl' into private/asachkov/per-…

b706a70

…aspect-device-code-split

More renamings

496ee74

kbobrovs reviewed Nov 11, 2022

View reviewed changes

llvm/tools/sycl-post-link/ModuleSplitter.cpp Outdated Show resolved Hide resolved

AlexeySachkov added 6 commits November 15, 2022 04:06

Make test more strict

c3b975d

Merge remote-tracking branch 'origin/sycl' into private/asachkov/per-…

2331196

…aspect-device-code-split

Apply clang-format

8cac6b8

Refactor tests a bit

4be7927

Add test for SYCL_EXTERNAL functions

01c88f4

Merge remote-tracking branch 'origin/sycl' into private/asachkov/per-…

7272fee

…aspect-device-code-split

AlexeySachkov mentioned this pull request Nov 16, 2022

[SYCL][NFC] Move ESIMD tests in-tree #7417

Merged

Do not always perform per-aspect split

12e2622

Apply review comments

b39798b

AlexeySachkov marked this pull request as ready for review November 17, 2022 13:38

AlexeySachkov requested a review from a team as a code owner November 17, 2022 13:38

AlexeySachkov requested a review from a team November 17, 2022 13:38

steffenlarsen reviewed Nov 17, 2022

View reviewed changes

kbobrovs approved these changes Nov 17, 2022

View reviewed changes

aelovikov-intel reviewed Nov 17, 2022

View reviewed changes

Apply comments

0cc45b2

sarnex approved these changes Nov 21, 2022

View reviewed changes

Apply comments

61ec5e9

pvchupin merged commit 9a2c4fe into intel:sycl Nov 21, 2022

AlexeySachkov mentioned this pull request Dec 1, 2022

SYCL build fails on Intel ARC GPU UoB-HPC/BabelStream#147

Closed

AlexeySachkov mentioned this pull request Dec 19, 2022

[SYCL] Remove explicit device code split setting intel/llvm-test-suite#1461

Draft

AlexeySachkov mentioned this pull request Jan 5, 2023

[SYCL] Check that fp16 aspect is supported before using half intel/llvm-test-suite#1487

Merged

AlexeySachkov mentioned this pull request Mar 28, 2023

[SYCL][NFCI] Refactor device code split implementation once again #8833

Merged

AlexeySachkov deleted the private/asachkov/per-aspect-device-code-split branch March 29, 2023 12:26

		if (!isEntryPoint(F, EmitOnlyKernelsAsEntryPoints) \|\|
		!MD.isEntryPointCandidate(F)) {

[SYCL] Implement per-aspect device code split #7302

[SYCL] Implement per-aspect device code split #7302

Uh oh!

Conversation

AlexeySachkov commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexeySachkov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbobrovs Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov Nov 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarnex commented Nov 7, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Nov 17, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kbobrovs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Nov 21, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarnex commented Nov 21, 2022

AlexeySachkov commented Nov 7, 2022 •

edited

Loading

AlexeySachkov left a comment •

edited

Loading

kbobrovs Nov 8, 2022 •

edited

Loading

AlexeySachkov Nov 11, 2022 •

edited

Loading

aelovikov-intel commented Nov 21, 2022 •

edited

Loading