[SYCL] Split device images based on accuracy level provided in option #10140

againull · 2023-06-29T17:33:34Z

This PR reuses optional kernel features mechanism to split device image
based on accuracy level provided using -ffp-accuracy compilation option (introduced in PR#8280):

When frontend emits fp intrinsic call and attaches the maximum error
attribute we also attach "sycl_used_aspects" metadata to the call
instruction with a value which corresponds to high, medium, low, sycl
or cuda. Mapping for those values is needed to be visible for SYCL
device compiler only and we intentionally don't put those values to
aspects enum because we don't need these aspects to be visible to
the user in this case (because of the reasons described in Details below).
Make SYCLPropagateAspectsUsage to propagate sycl_used_aspects
metadata from instructions to kernel.
Don't add internal aspects into the device requirements, because we
don't need processing of these internal aspects (with negative values) in the SYCL RT.

Splitting functionality based on sycl_used_aspects metadata is available for free.

Details:
Currently accruracy level can be controlled using the following options.
For entire translation unit:
-ffp-accuracy=high
-ffp-accuracy=medium
-ffp-accuracy=low
-ffp-accuracy=sycl
-ffp-accuracy=cuda

For particular funcions in the translation unit:
-ffp-accuracy=low:sin,cos

Whenever frontend sees a math function in a kernel or a device function
it emits fp intrinsic call with attached callsite attribute indicating
value of the maximum error. llvm-spirv is going to translate this
builtins to regular __ocl intrinsics and translate callsite attribute to
decorator (which is a new spirv extension). If that extension is not supported
by the backend, it is going to emit an error. Error is emitted also in
the case if backend supports the extension but can't compile the kernel because
it doesn't have corresponding implemenation of math function complying with
required maximum error.

Aspects corrsponding to different levels of accuracy are not suitable in
this case because aforementioned options are sycl program compilation options, i.e.
it doesn't make sense to provide an opportunity to the user to write
something like this:
if (dev.has(aspect::ext_oneapi_fp_intrinsic_accuracy_high)) {
/* submit kernel using high accuracy intrinsics */
}

But on our side we still would like to put kernels and device functions
to different images based on required accuracy level. It is necessary because
some backends may support, for example, low and medium accuracy but don't
support high accuracy.

This PR reuses optional kernel features mechanism to provide this splitting logic based on accuracy level: 1. When frontend emits fp intrinsic call and attaches the maximum error attribute we also attach "sycl_used_aspects" metadata to the call instruction with a value which corresponds to high, medium, low, sycl or cuda. Mapping for those values is needed to be visible for SYCL device compiler only and we intentionally don't put those values to aspects enum because we don't need aspects because of the reasons I described above. 2. Make SYCLPropagateAspectsUsage to propagate sycl_used_aspects metadata from instructions to kernel. 3. Don't add internal aspects into the requirements, because we don't need processing of these fake aspects (with negative values) in the SYCL RT. After these changes splitting functionality based on sycl_used_aspects metadata is available for free. More details: Currently accruracy level can be controlled using the following options. For entire translation unit: -ffp-accuracy=high -ffp-accuracy=medium -ffp-accuracy=low -ffp-accuracy=sycl -ffp-accuracy=cuda For particular funcions in the translation unit: -ffp-accuracy=low:sin,cos Whenever frontend sees a math function in a kernel or a device function it emits fp intrinsic call with attached callsite attribute indicating value of the maximum error. llvm-spirv is going to translate this builtins to regular __ocl intrinsics and translate callsite attribute to decorator (which is a new spirv extension). If that extension is not supported by the backend, it is going to emit an error. Error is emitted also in the case if backend supports the extension but can't compile the kernel because it doesn't have corresponding implemenation of math function complying with required maximum error. Aspects corrsponding to different levels of accuracy are not suitable in this case because aforementioned options are sycl program compilation options, i.e. it doesn't make sense to provide an opportunity to the user to write something like this: if (dev.has(aspect::ext_oneapi_fp_intrinsic_accuracy_high)) { /* submit kernel using high accuracy intrinsics */ } But on our side we still would like to put kernels and device functions to different images based on required accuracy level. It is necessary because some backends may support, for example, low and medium accuracy but don't support high accuracy. In this case we want to make kernels using low and medium accuracy levels buildable, so we can't put kernels requiring high accuracy and low/medidum accuracy together.

elizabethandrews

Please add a frontend test

asudarsa · 2023-07-09T17:32:53Z

Hi @againull

Thanks for the PR.
I had some high level questions about this implementation:

This implementation seems to assume that the backend will either have support (or no support) for all functions with a specific level of accuracy. Is it possible that the backend may support 'high accuracy' implementations of some functions and 'low accuracy' implementations of some other functions? I am not sure if that scenario is supported here.
You have used '-1' to '-5' as metadata here. How can we guarantee that no other pass will insert metadata that clashes with this? i.e. How do we intend to document this?
Is it possible to use the existing attribute, instead of adding new aspect here? We might need to propagate attributes from call site to caller function. We do have a mechanism to easily split device code based on attributes.

Thanks again

asudarsa · 2023-07-09T17:37:54Z

clang/lib/CodeGen/CGCall.cpp

-        !getLangOpts().FPAccuracyVal.empty()) {
+    if ((!getLangOpts().FPAccuracyFuncMap.empty() ||
+         !getLangOpts().FPAccuracyVal.empty()) &&
+        isa_and_nonnull<FunctionDecl>(TargetDecl)) {


What is the relevance of this change w.r.t the overall scope of this PR? Sorry if I am missing something here.

This is a fix for small bug in frontend after PR: 8280
I included the fix to unblock this PR because otherwise I am not able to test my changes, execution would fail earlier in the frontend.

Can you please explain the bug introduced and how this fixes it? Adding @zahiraam to review

As we discussed offline for the 2 tests that you were looking at, I didn't see the need for this fix. Not quite sure what this is doing?

Oh, I've rechecked the tests and indeed this change is redundant. Removed this change from PR. Thanks.

asudarsa · 2023-07-09T17:43:07Z

clang/lib/CodeGen/CGBuiltin.cpp

@@ -513,12 +514,17 @@ static CallInst *CreateBuiltinCallWithAttr(CodeGenFunction &CGF, StringRef Name,
  // TODO: Replace AttrList with a single attribute. The call can only have a
  // single FPAccuracy attribute.
  llvm::AttributeList AttrList;
+  // "sycl_used_aspects" metadata associated with the call.
+  SmallVector<llvm::Metadata *, 4> AspectsMD;


Does this need to be a vector? I am not sure if we can multiple such MD associated with a single call. Please note the TODO associated with the AttrList above.

Thanks

Fixed this, thank you!

asudarsa · 2023-07-09T17:54:58Z

clang/lib/CodeGen/CGCall.cpp

@@ -1846,8 +1847,18 @@ static llvm::fp::FPAccuracy convertFPAccuracy(StringRef FPAccuracyStr) {
      .Case("cuda", llvm::fp::FPAccuracy::CUDA);
 }

+static int32_t convertFPAccuracyToAspect(StringRef FPAccuracyStr) {


Do we need to add an assert here to ensure this function is called with appropriate FPAccuracyStr?

Thanks

Added assert, thanks.

asudarsa · 2023-07-09T17:56:06Z

clang/lib/CodeGen/CGCall.cpp

@@ -1864,6 +1875,9 @@ void CodeGenModule::getDefaultFunctionFPAccuracyAttributes(
          ID, FuncType, convertFPAccuracy(FuncMapIt->second));
      assert(!FPAccuracyVal.empty() && "A valid accuracy value is expected");
      FuncAttrs.addAttribute("fpbuiltin-max-error=", FPAccuracyVal);
+      if (getLangOpts().SYCLIsDevice)


Do we need this check here?

Thanks

Yes, we need this check because sycl_used_aspects metadata is redundant if it's not SYCL device code.

asudarsa · 2023-07-09T18:15:29Z

sycl/test/optional_kernel_features/fp-accuracy.cpp

@@ -0,0 +1,138 @@
+// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin  -fno-math-errno  -fsycl -fsycl-device-only


Is it possible to test if the aspects got correctly propagated to the calling functions?

Thanks

Added test to check propagation, thanks!

asudarsa · 2023-07-09T18:17:13Z

clang/test/CodeGenSYCL/fp-accuracy.cpp

+// CHECK: [[ASPECT1]] = !{i32 -1}
+// CHECK: [[ASPECT2]] = !{i32 -2}
+// CHECK: [[ASPECT3]] = !{i32 -3}
+// CHECK: [[ASPECT4]] = !{i32 -5}


nit: can be rename these to match aspect name and the value?

asudarsa

I have added few high-level questions and some code comments. please address.

Thanks

zahiraam · 2023-07-12T17:36:31Z

clang/lib/CodeGen/CGCall.cpp

-    if (!getLangOpts().FPAccuracyFuncMap.empty() ||
-        !getLangOpts().FPAccuracyVal.empty()) {
+    if ((!getLangOpts().FPAccuracyFuncMap.empty() ||
+         !getLangOpts().FPAccuracyVal.empty())) {


Why double parentheses?

Sorry, fixed.

zahiraam · 2023-07-12T17:40:44Z

clang/test/CodeGenSYCL/fp-accuracy.cpp

@@ -0,0 +1,102 @@
+// RUN: %clang_cc1  -fsycl-is-device -ffp-builtin-accuracy=high:sin,sqrt -ffp-builtin-accuracy=medium:cos -ffp-builtin-accuracy=low:tan -ffp-builtin-accuracy=cuda:exp,acos -ffp-builtin-accuracy=sycl:log,asin -emit-llvm -triple spir64-unknown-unknown -disable-llvm-passes %s -o - | FileCheck %s


Can you add a run line where a TU value of accuracy is used and a another one with a mix of TU accuracy and function specific ones?

Added additional run lines according to your suggestion.

zahiraam · 2023-07-12T17:43:06Z

sycl/test/optional_kernel_features/fp-accuracy.cpp

@@ -0,0 +1,138 @@
+// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin  -fno-math-errno  -fsycl -fsycl-device-only


Same request here for additional RUN lines.

Added additional run lines according to your suggestion.

zahiraam

LGTM. Thanks.

AlexeySachkov · 2023-07-13T08:03:23Z

We mostly discussed this offline with @asudarsa , will duplicate this info here.

Is it possible to use the existing attribute, instead of adding new aspect here? We might need to propagate attributes from call site to caller function. We do have a mechanism to easily split device code based on attributes.

Thanks again

Unfortunately, I am not sure that we can do that. Attribute contains only the value of error which may be different for each math function (for example for sycl or cuda accuracy), so basically attribute doesn't contain information about level of accuracy anymore, it only contains error value.

I'm confused here: why it is possible to propagate -1 in a metadata from intrinsic to a kernel, but isn't possible to propagate "uses-accuracy=low" (or something like that) ?

Suggested approach does looks like a hack, a bit and I think that we should have some generic infrastructure for propagating attributes/metadata, which are not aspects. There are a few examples already where it is needed: in sycl-post-link we are trying to understand which kernels are using assert; accuracy levels from this PR; there are plans to do device code split based on data types from joint matrix extension and they won't fit into aspects as well, due to complexity.

I don't want to block this PR, but I would like to ensure that there is a path to refactor accuracy levels handling from aspects to generic attributes/metadata propagation.

againull · 2023-07-13T13:07:40Z

We mostly discussed this offline with @asudarsa , will duplicate this info here.

Is it possible to use the existing attribute, instead of adding new aspect here? We might need to propagate attributes from call site to caller function. We do have a mechanism to easily split device code based on attributes.

Thanks again

Unfortunately, I am not sure that we can do that. Attribute contains only the value of error which may be different for each math function (for example for sycl or cuda accuracy), so basically attribute doesn't contain information about level of accuracy anymore, it only contains error value.

I'm confused here: why it is possible to propagate -1 in a metadata from intrinsic to a kernel, but isn't possible to propagate "uses-accuracy=low" (or something like that) ?

In this case attribute attached to the intrinsic call looks like this: attributes #3 = { "fpbuiltin-max-error="="1.0f" }
where error value might be different (for different functions) for the same accuracy level. I.e. attribute doesn't contain information whether it is high/medium/low/sycl/cuda accuracy, we lose this information after mapping accuracy level to error value according to some mapping table. And in this implementation aspect with value "-1" corresponds to high accuracy for any function, for example. Just like other aspects match to particular positive numbers.
I am sorry, unfortunately, I wasn't aware of the scenarios that you described where aspects are not usable. This implementation idea of mapping accuracy levels to "internal" aspects came up as part of our discussion with Gregory and Andy. Then if you don't mind I will go ahead with this implementation. And will discuss with you future possible refactoring and more general solution.

againull · 2023-07-13T13:17:19Z

Hello @Fznamznon, could you please help to review this PR. Originally @elizabethandrews was looking at this from intel/dpcpp-cfe-reviewers group. But unfortunately she is on vacation and other members are on vacation as well. Formally @zahiraam is from the frontend team but not in the dpcpp-cfe-reviewers group, so I am not sure if I can treat her approval as green light from CFE group.

againull · 2023-07-13T13:18:19Z

HIP AMDGPU failure is an infrastructure issue which is unrelated to this PR.

AlexeySachkov · 2023-07-13T14:22:44Z

In this case attribute attached to the intrinsic call looks like this: attributes #3 = { "fpbuiltin-max-error="="1.0f" }
where error value might be different (for different functions) for the same accuracy level. I.e. attribute doesn't contain information whether it is high/medium/low/sycl/cuda accuracy, we lose this information after mapping accuracy level to error value according to some mapping table. And in this implementation aspect with value "-1" corresponds to high accuracy for any function, for example. Just like other aspects match to particular positive numbers.

Thanks, I see. I need to take a more detailed look at the PR, I've missed that mapping part.

I am sorry, unfortunately, I wasn't aware of the scenarios that you described where aspects are not usable. This implementation idea of mapping accuracy levels to "internal" aspects came up as part of our discussion with Gregory and Andy. Then if you don't mind I will go ahead with this implementation. And will discuss with you future possible refactoring and more general solution.

It is also my bad for not reviewing this earlier. As I said, I won't block the PR and I'm perfectly fine with doing some unification later. For example, that joint matrix effort has not yet started: we could think of some infrastructure in scope of that effort and once it is done, refactor what we can to use that infrastructure.

Fznamznon · 2023-07-13T14:52:23Z

clang/lib/CodeGen/CGBuiltin.cpp

-              .Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt);
+              .Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt)
+              .Default(0);
+      if (!FPAccuracyIntrinsicID) {


This kind of creates else after return so I agree about move.

Fznamznon · 2023-07-13T14:54:08Z

clang/lib/CodeGen/CGBuiltin.cpp

@@ -22144,7 +22150,8 @@ llvm::CallInst *CodeGenFunction::EmitFPBuiltinIndirectCall(
    // Even if the current function doesn't have a clang builtin, create
    // an 'fpbuiltin-max-error' attribute for it; unless it's marked with
    // an NoBuiltin attribute.
-    if (!FD->hasAttr<NoBuiltinAttr>()) {
+    if (!FD->hasAttr<NoBuiltinAttr>() &&
+        FD->getNameInfo().getName().isIdentifier()) {


When is a function name not an identifier?

If it's a CXXContructorDecl then it's not an identifier.

Understood but

Can you explain why this change is required?

I think this question is still not quite answered.

Fznamznon · 2023-07-13T14:55:17Z

clang/lib/CodeGen/CGBuiltin.cpp

-              .Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt);
+              .Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt)
+              .Default(0);
+      if (!FPAccuracyIntrinsicID) {


If the next else-after-return sequence comes from code added in intel/llvm repo, I would also appreciate a slight refactoring since you're here.

clang/lib/CodeGen/CGCall.cpp

clang/lib/CodeGen/CGSYCLRuntime.h

clang/test/CodeGenSYCL/fp-accuracy.cpp

againull temporarily deployed to aws July 6, 2023 17:46 — with GitHub Actions Inactive

againull temporarily deployed to aws July 6, 2023 18:19 — with GitHub Actions Inactive

againull force-pushed the fp_accuracy_image_splitting branch from b4e9d70 to 00a326e Compare July 6, 2023 21:22

Fix frontend issues after PR#8280

94ac8d5

againull force-pushed the fp_accuracy_image_splitting branch from 00a326e to 6577700 Compare July 6, 2023 22:00

againull temporarily deployed to aws July 6, 2023 22:21 — with GitHub Actions Inactive

againull force-pushed the fp_accuracy_image_splitting branch from 6577700 to f235c44 Compare July 6, 2023 22:22

againull marked this pull request as ready for review July 6, 2023 22:30

againull requested review from a team as code owners July 6, 2023 22:30

againull requested a review from sergey-semenov July 6, 2023 22:30

againull temporarily deployed to aws July 6, 2023 22:42 — with GitHub Actions Inactive

againull temporarily deployed to aws July 7, 2023 00:37 — with GitHub Actions Inactive

elizabethandrews reviewed Jul 7, 2023

View reviewed changes

Add frontend test

649fd15

againull temporarily deployed to aws July 7, 2023 21:28 — with GitHub Actions Inactive

againull temporarily deployed to aws July 7, 2023 22:09 — with GitHub Actions Inactive

asudarsa reviewed Jul 9, 2023

View reviewed changes

asudarsa requested changes Jul 9, 2023

View reviewed changes

againull added 3 commits July 10, 2023 12:09

Metadata propagation test

404f82e

Add info to design documentation

d107740

Address review comments

a537ca9

againull requested a review from zahiraam July 12, 2023 17:12

againull temporarily deployed to aws July 12, 2023 17:27 — with GitHub Actions Inactive

zahiraam reviewed Jul 12, 2023

View reviewed changes

againull added 2 commits July 12, 2023 10:39

Remove parentheses

9408518

Merge remote-tracking branch 'origin/sycl' into orig_patch

2fb28d3

zahiraam reviewed Jul 12, 2023

View reviewed changes

againull temporarily deployed to aws July 12, 2023 18:38 — with GitHub Actions Inactive

againull added 3 commits July 12, 2023 12:59

Add additional RUN lines for TU and mixed cases

9807f64

Format

a13c1ff

Fix EOL

c3afa41

againull requested a review from zahiraam July 12, 2023 20:07

againull temporarily deployed to aws July 12, 2023 20:25 — with GitHub Actions Inactive

zahiraam approved these changes Jul 12, 2023

View reviewed changes

againull temporarily deployed to aws July 12, 2023 21:13 — with GitHub Actions Inactive

againull requested a review from Fznamznon July 13, 2023 13:14

Fznamznon reviewed Jul 13, 2023

View reviewed changes

Address review comments

8e025fd

againull requested a review from Fznamznon July 13, 2023 16:18

againull temporarily deployed to aws July 13, 2023 17:20 — with GitHub Actions Inactive

againull temporarily deployed to aws July 13, 2023 19:04 — with GitHub Actions Inactive

Fznamznon approved these changes Jul 14, 2023

View reviewed changes

againull merged commit 8d77da7 into intel:sycl Jul 14, 2023

againull mentioned this pull request Aug 2, 2023

[SYCL] Fix integer type overflow in SYCLDeviceRequirements #10614

Merged

againull deleted the fp_accuracy_image_splitting branch December 22, 2023 04:22

		@@ -0,0 +1,138 @@
		// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin -fno-math-errno -fsycl -fsycl-device-only

		@@ -0,0 +1,102 @@
		// RUN: %clang_cc1 -fsycl-is-device -ffp-builtin-accuracy=high:sin,sqrt -ffp-builtin-accuracy=medium:cos -ffp-builtin-accuracy=low:tan -ffp-builtin-accuracy=cuda:exp,acos -ffp-builtin-accuracy=sycl:log,asin -emit-llvm -triple spir64-unknown-unknown -disable-llvm-passes %s -o - \| FileCheck %s

[SYCL] Split device images based on accuracy level provided in option #10140

[SYCL] Split device images based on accuracy level provided in option #10140

Uh oh!

Conversation

againull commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elizabethandrews left a comment

Choose a reason for hiding this comment

Uh oh!

asudarsa commented Jul 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asudarsa Jul 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asudarsa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zahiraam left a comment

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov commented Jul 13, 2023

Uh oh!

againull commented Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

againull commented Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

againull commented Jul 13, 2023

Uh oh!

AlexeySachkov commented Jul 13, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

againull commented Jun 29, 2023 •

edited

Loading

asudarsa Jul 9, 2023 •

edited

Loading

againull commented Jul 13, 2023 •

edited

Loading

againull commented Jul 13, 2023 •

edited

Loading