Skip to content

[SYCL] Split device images based on accuracy level provided in option #10140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 14, 2023

Conversation

againull
Copy link
Contributor

@againull againull commented Jun 29, 2023

This PR reuses optional kernel features mechanism to split device image
based on accuracy level provided using -ffp-accuracy compilation option (introduced in PR#8280):

  1. When frontend emits fp intrinsic call and attaches the maximum error
    attribute we also attach "sycl_used_aspects" metadata to the call
    instruction with a value which corresponds to high, medium, low, sycl
    or cuda. Mapping for those values is needed to be visible for SYCL
    device compiler only and we intentionally don't put those values to
    aspects enum because we don't need these aspects to be visible to
    the user in this case (because of the reasons described in Details below).
  2. Make SYCLPropagateAspectsUsage to propagate sycl_used_aspects
    metadata from instructions to kernel.
  3. Don't add internal aspects into the device requirements, because we
    don't need processing of these internal aspects (with negative values) in the SYCL RT.

Splitting functionality based on sycl_used_aspects metadata is available for free.

Details:
Currently accruracy level can be controlled using the following options.
For entire translation unit:
-ffp-accuracy=high
-ffp-accuracy=medium
-ffp-accuracy=low
-ffp-accuracy=sycl
-ffp-accuracy=cuda

For particular funcions in the translation unit:
-ffp-accuracy=low:sin,cos

Whenever frontend sees a math function in a kernel or a device function
it emits fp intrinsic call with attached callsite attribute indicating
value of the maximum error. llvm-spirv is going to translate this
builtins to regular __ocl intrinsics and translate callsite attribute to
decorator (which is a new spirv extension). If that extension is not supported
by the backend, it is going to emit an error. Error is emitted also in
the case if backend supports the extension but can't compile the kernel because
it doesn't have corresponding implemenation of math function complying with
required maximum error.

Aspects corrsponding to different levels of accuracy are not suitable in
this case because aforementioned options are sycl program compilation options, i.e.
it doesn't make sense to provide an opportunity to the user to write
something like this:
if (dev.has(aspect::ext_oneapi_fp_intrinsic_accuracy_high)) {
/* submit kernel using high accuracy intrinsics */
}

But on our side we still would like to put kernels and device functions
to different images based on required accuracy level. It is necessary because
some backends may support, for example, low and medium accuracy but don't
support high accuracy.

@againull againull temporarily deployed to aws July 6, 2023 17:46 — with GitHub Actions Inactive
@againull againull temporarily deployed to aws July 6, 2023 18:19 — with GitHub Actions Inactive
@againull againull force-pushed the fp_accuracy_image_splitting branch from b4e9d70 to 00a326e Compare July 6, 2023 21:22
@againull againull force-pushed the fp_accuracy_image_splitting branch from 00a326e to 6577700 Compare July 6, 2023 22:00
This PR reuses optional kernel features mechanism to provide this
splitting logic based on accuracy level:
1. When frontend emits fp intrinsic call and attaches the maximum error
   attribute we also attach "sycl_used_aspects" metadata to the call
   instruction with a value which corresponds to high, medium, low, sycl
   or cuda. Mapping for those values is needed to be visible for SYCL
   device compiler only and we intentionally don't put those values to
   aspects enum because we don't need aspects because of the reasons I
   described above.
2. Make SYCLPropagateAspectsUsage to propagate sycl_used_aspects
   metadata from instructions to kernel.
3. Don't add internal aspects into the requirements, because we don't need
   processing of these fake aspects (with negative values) in the SYCL RT.
After these changes splitting functionality based on sycl_used_aspects
metadata is available for free.

More details:
Currently accruracy level can be controlled using the following options.
For entire translation unit:
-ffp-accuracy=high
-ffp-accuracy=medium
-ffp-accuracy=low
-ffp-accuracy=sycl
-ffp-accuracy=cuda

For particular funcions in the translation unit:
-ffp-accuracy=low:sin,cos

Whenever frontend sees a math function in a kernel or a device function
it emits fp intrinsic call with attached callsite attribute indicating
value of the maximum error. llvm-spirv is going to translate this
builtins to regular __ocl intrinsics and translate callsite attribute to
decorator (which is a new spirv extension). If that extension is not supported
by the backend, it is going to emit an error. Error is emitted also in
the case if backend supports the extension but can't compile the kernel because
it doesn't have corresponding implemenation of math function complying with
required maximum error.

Aspects corrsponding to different levels of accuracy are not suitable in
this case because aforementioned options are sycl program compilation options, i.e.
it doesn't make sense to provide an opportunity to the user to write
something like this:
if (dev.has(aspect::ext_oneapi_fp_intrinsic_accuracy_high)) {
  /* submit kernel using high accuracy intrinsics */
}

But on our side we still would like to put kernels and device functions
to different images based on required accuracy level. It is necessary because
some backends may support, for example, low and medium accuracy but don't
support high accuracy. In this case we want to make kernels using low
and medium accuracy levels buildable, so we can't put kernels requiring
high accuracy and low/medidum accuracy together.
@againull againull temporarily deployed to aws July 6, 2023 22:21 — with GitHub Actions Inactive
@againull againull force-pushed the fp_accuracy_image_splitting branch from 6577700 to f235c44 Compare July 6, 2023 22:22
@againull againull marked this pull request as ready for review July 6, 2023 22:30
@againull againull requested review from a team as code owners July 6, 2023 22:30
@againull againull requested a review from sergey-semenov July 6, 2023 22:30
@againull againull temporarily deployed to aws July 6, 2023 22:42 — with GitHub Actions Inactive
@againull againull temporarily deployed to aws July 7, 2023 00:37 — with GitHub Actions Inactive
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a frontend test

@againull againull temporarily deployed to aws July 7, 2023 21:28 — with GitHub Actions Inactive
@againull againull temporarily deployed to aws July 7, 2023 22:09 — with GitHub Actions Inactive
@asudarsa
Copy link
Contributor

asudarsa commented Jul 9, 2023

Hi @againull

Thanks for the PR.
I had some high level questions about this implementation:

  1. This implementation seems to assume that the backend will either have support (or no support) for all functions with a specific level of accuracy. Is it possible that the backend may support 'high accuracy' implementations of some functions and 'low accuracy' implementations of some other functions? I am not sure if that scenario is supported here.
  2. You have used '-1' to '-5' as metadata here. How can we guarantee that no other pass will insert metadata that clashes with this? i.e. How do we intend to document this?
  3. Is it possible to use the existing attribute, instead of adding new aspect here? We might need to propagate attributes from call site to caller function. We do have a mechanism to easily split device code based on attributes.

Thanks again

!getLangOpts().FPAccuracyVal.empty()) {
if ((!getLangOpts().FPAccuracyFuncMap.empty() ||
!getLangOpts().FPAccuracyVal.empty()) &&
isa_and_nonnull<FunctionDecl>(TargetDecl)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relevance of this change w.r.t the overall scope of this PR? Sorry if I am missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fix for small bug in frontend after PR: 8280
I included the fix to unblock this PR because otherwise I am not able to test my changes, execution would fail earlier in the frontend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain the bug introduced and how this fixes it? Adding @zahiraam to review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline for the 2 tests that you were looking at, I didn't see the need for this fix. Not quite sure what this is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I've rechecked the tests and indeed this change is redundant. Removed this change from PR. Thanks.

@@ -513,12 +514,17 @@ static CallInst *CreateBuiltinCallWithAttr(CodeGenFunction &CGF, StringRef Name,
// TODO: Replace AttrList with a single attribute. The call can only have a
// single FPAccuracy attribute.
llvm::AttributeList AttrList;
// "sycl_used_aspects" metadata associated with the call.
SmallVector<llvm::Metadata *, 4> AspectsMD;
Copy link
Contributor

@asudarsa asudarsa Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a vector? I am not sure if we can multiple such MD associated with a single call. Please note the TODO associated with the AttrList above.

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, thank you!

@@ -1846,8 +1847,18 @@ static llvm::fp::FPAccuracy convertFPAccuracy(StringRef FPAccuracyStr) {
.Case("cuda", llvm::fp::FPAccuracy::CUDA);
}

static int32_t convertFPAccuracyToAspect(StringRef FPAccuracyStr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add an assert here to ensure this function is called with appropriate FPAccuracyStr?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added assert, thanks.

@@ -1864,6 +1875,9 @@ void CodeGenModule::getDefaultFunctionFPAccuracyAttributes(
ID, FuncType, convertFPAccuracy(FuncMapIt->second));
assert(!FPAccuracyVal.empty() && "A valid accuracy value is expected");
FuncAttrs.addAttribute("fpbuiltin-max-error=", FPAccuracyVal);
if (getLangOpts().SYCLIsDevice)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check here?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need this check because sycl_used_aspects metadata is redundant if it's not SYCL device code.

@@ -0,0 +1,138 @@
// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin -fno-math-errno -fsycl -fsycl-device-only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test if the aspects got correctly propagated to the calling functions?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test to check propagation, thanks!

// CHECK: [[ASPECT1]] = !{i32 -1}
// CHECK: [[ASPECT2]] = !{i32 -2}
// CHECK: [[ASPECT3]] = !{i32 -3}
// CHECK: [[ASPECT4]] = !{i32 -5}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be rename these to match aspect name and the value?

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added few high-level questions and some code comments. please address.

Thanks

@againull againull requested a review from zahiraam July 12, 2023 17:12
@againull againull temporarily deployed to aws July 12, 2023 17:27 — with GitHub Actions Inactive
if (!getLangOpts().FPAccuracyFuncMap.empty() ||
!getLangOpts().FPAccuracyVal.empty()) {
if ((!getLangOpts().FPAccuracyFuncMap.empty() ||
!getLangOpts().FPAccuracyVal.empty())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why double parentheses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, fixed.

@@ -0,0 +1,102 @@
// RUN: %clang_cc1 -fsycl-is-device -ffp-builtin-accuracy=high:sin,sqrt -ffp-builtin-accuracy=medium:cos -ffp-builtin-accuracy=low:tan -ffp-builtin-accuracy=cuda:exp,acos -ffp-builtin-accuracy=sycl:log,asin -emit-llvm -triple spir64-unknown-unknown -disable-llvm-passes %s -o - | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a run line where a TU value of accuracy is used and a another one with a mix of TU accuracy and function specific ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional run lines according to your suggestion.

@@ -0,0 +1,138 @@
// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin -fno-math-errno -fsycl -fsycl-device-only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same request here for additional RUN lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional run lines according to your suggestion.

@againull againull temporarily deployed to aws July 12, 2023 18:38 — with GitHub Actions Inactive
@againull againull requested a review from zahiraam July 12, 2023 20:07
@againull againull temporarily deployed to aws July 12, 2023 20:25 — with GitHub Actions Inactive
Copy link
Contributor

@zahiraam zahiraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@againull againull temporarily deployed to aws July 12, 2023 21:13 — with GitHub Actions Inactive
@AlexeySachkov
Copy link
Contributor

We mostly discussed this offline with @asudarsa , will duplicate this info here.

  1. Is it possible to use the existing attribute, instead of adding new aspect here? We might need to propagate attributes from call site to caller function. We do have a mechanism to easily split device code based on attributes.

Thanks again

Unfortunately, I am not sure that we can do that. Attribute contains only the value of error which may be different for each math function (for example for sycl or cuda accuracy), so basically attribute doesn't contain information about level of accuracy anymore, it only contains error value.

I'm confused here: why it is possible to propagate -1 in a metadata from intrinsic to a kernel, but isn't possible to propagate "uses-accuracy=low" (or something like that) ?

Suggested approach does looks like a hack, a bit and I think that we should have some generic infrastructure for propagating attributes/metadata, which are not aspects. There are a few examples already where it is needed: in sycl-post-link we are trying to understand which kernels are using assert; accuracy levels from this PR; there are plans to do device code split based on data types from joint matrix extension and they won't fit into aspects as well, due to complexity.

I don't want to block this PR, but I would like to ensure that there is a path to refactor accuracy levels handling from aspects to generic attributes/metadata propagation.

@againull
Copy link
Contributor Author

againull commented Jul 13, 2023

We mostly discussed this offline with @asudarsa , will duplicate this info here.

  1. Is it possible to use the existing attribute, instead of adding new aspect here? We might need to propagate attributes from call site to caller function. We do have a mechanism to easily split device code based on attributes.

Thanks again

Unfortunately, I am not sure that we can do that. Attribute contains only the value of error which may be different for each math function (for example for sycl or cuda accuracy), so basically attribute doesn't contain information about level of accuracy anymore, it only contains error value.

I'm confused here: why it is possible to propagate -1 in a metadata from intrinsic to a kernel, but isn't possible to propagate "uses-accuracy=low" (or something like that) ?

In this case attribute attached to the intrinsic call looks like this: attributes #3 = { "fpbuiltin-max-error="="1.0f" }
where error value might be different (for different functions) for the same accuracy level. I.e. attribute doesn't contain information whether it is high/medium/low/sycl/cuda accuracy, we lose this information after mapping accuracy level to error value according to some mapping table. And in this implementation aspect with value "-1" corresponds to high accuracy for any function, for example. Just like other aspects match to particular positive numbers.
I am sorry, unfortunately, I wasn't aware of the scenarios that you described where aspects are not usable. This implementation idea of mapping accuracy levels to "internal" aspects came up as part of our discussion with Gregory and Andy. Then if you don't mind I will go ahead with this implementation. And will discuss with you future possible refactoring and more general solution.

@againull againull requested a review from Fznamznon July 13, 2023 13:14
@againull
Copy link
Contributor Author

againull commented Jul 13, 2023

Hello @Fznamznon, could you please help to review this PR. Originally @elizabethandrews was looking at this from intel/dpcpp-cfe-reviewers group. But unfortunately she is on vacation and other members are on vacation as well. Formally @zahiraam is from the frontend team but not in the dpcpp-cfe-reviewers group, so I am not sure if I can treat her approval as green light from CFE group.

@againull
Copy link
Contributor Author

HIP AMDGPU failure is an infrastructure issue which is unrelated to this PR.

@AlexeySachkov
Copy link
Contributor

In this case attribute attached to the intrinsic call looks like this: attributes #3 = { "fpbuiltin-max-error="="1.0f" }
where error value might be different (for different functions) for the same accuracy level. I.e. attribute doesn't contain information whether it is high/medium/low/sycl/cuda accuracy, we lose this information after mapping accuracy level to error value according to some mapping table. And in this implementation aspect with value "-1" corresponds to high accuracy for any function, for example. Just like other aspects match to particular positive numbers.

Thanks, I see. I need to take a more detailed look at the PR, I've missed that mapping part.

I am sorry, unfortunately, I wasn't aware of the scenarios that you described where aspects are not usable. This implementation idea of mapping accuracy levels to "internal" aspects came up as part of our discussion with Gregory and Andy. Then if you don't mind I will go ahead with this implementation. And will discuss with you future possible refactoring and more general solution.

It is also my bad for not reviewing this earlier. As I said, I won't block the PR and I'm perfectly fine with doing some unification later. For example, that joint matrix effort has not yet started: we could think of some infrastructure in scope of that effort and once it is done, refactor what we can to use that infrastructure.

.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt);
.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt)
.Default(0);
if (!FPAccuracyIntrinsicID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of creates else after return so I agree about move.

@@ -22144,7 +22150,8 @@ llvm::CallInst *CodeGenFunction::EmitFPBuiltinIndirectCall(
// Even if the current function doesn't have a clang builtin, create
// an 'fpbuiltin-max-error' attribute for it; unless it's marked with
// an NoBuiltin attribute.
if (!FD->hasAttr<NoBuiltinAttr>()) {
if (!FD->hasAttr<NoBuiltinAttr>() &&
FD->getNameInfo().getName().isIdentifier()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is a function name not an identifier?

If it's a CXXContructorDecl then it's not an identifier.

Understood but

Can you explain why this change is required?

I think this question is still not quite answered.

.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt);
.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt)
.Default(0);
if (!FPAccuracyIntrinsicID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the next else-after-return sequence comes from code added in intel/llvm repo, I would also appreciate a slight refactoring since you're here.

@againull againull requested a review from Fznamznon July 13, 2023 16:18
@againull againull temporarily deployed to aws July 13, 2023 17:20 — with GitHub Actions Inactive
@againull againull temporarily deployed to aws July 13, 2023 19:04 — with GitHub Actions Inactive
@againull againull merged commit 8d77da7 into intel:sycl Jul 14, 2023
@againull againull deleted the fp_accuracy_image_splitting branch December 22, 2023 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants