-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Split device images based on accuracy level provided in option #10140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b4e9d70
to
00a326e
Compare
00a326e
to
6577700
Compare
This PR reuses optional kernel features mechanism to provide this splitting logic based on accuracy level: 1. When frontend emits fp intrinsic call and attaches the maximum error attribute we also attach "sycl_used_aspects" metadata to the call instruction with a value which corresponds to high, medium, low, sycl or cuda. Mapping for those values is needed to be visible for SYCL device compiler only and we intentionally don't put those values to aspects enum because we don't need aspects because of the reasons I described above. 2. Make SYCLPropagateAspectsUsage to propagate sycl_used_aspects metadata from instructions to kernel. 3. Don't add internal aspects into the requirements, because we don't need processing of these fake aspects (with negative values) in the SYCL RT. After these changes splitting functionality based on sycl_used_aspects metadata is available for free. More details: Currently accruracy level can be controlled using the following options. For entire translation unit: -ffp-accuracy=high -ffp-accuracy=medium -ffp-accuracy=low -ffp-accuracy=sycl -ffp-accuracy=cuda For particular funcions in the translation unit: -ffp-accuracy=low:sin,cos Whenever frontend sees a math function in a kernel or a device function it emits fp intrinsic call with attached callsite attribute indicating value of the maximum error. llvm-spirv is going to translate this builtins to regular __ocl intrinsics and translate callsite attribute to decorator (which is a new spirv extension). If that extension is not supported by the backend, it is going to emit an error. Error is emitted also in the case if backend supports the extension but can't compile the kernel because it doesn't have corresponding implemenation of math function complying with required maximum error. Aspects corrsponding to different levels of accuracy are not suitable in this case because aforementioned options are sycl program compilation options, i.e. it doesn't make sense to provide an opportunity to the user to write something like this: if (dev.has(aspect::ext_oneapi_fp_intrinsic_accuracy_high)) { /* submit kernel using high accuracy intrinsics */ } But on our side we still would like to put kernels and device functions to different images based on required accuracy level. It is necessary because some backends may support, for example, low and medium accuracy but don't support high accuracy. In this case we want to make kernels using low and medium accuracy levels buildable, so we can't put kernels requiring high accuracy and low/medidum accuracy together.
6577700
to
f235c44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a frontend test
Hi @againull Thanks for the PR.
Thanks again |
clang/lib/CodeGen/CGCall.cpp
Outdated
!getLangOpts().FPAccuracyVal.empty()) { | ||
if ((!getLangOpts().FPAccuracyFuncMap.empty() || | ||
!getLangOpts().FPAccuracyVal.empty()) && | ||
isa_and_nonnull<FunctionDecl>(TargetDecl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relevance of this change w.r.t the overall scope of this PR? Sorry if I am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fix for small bug in frontend after PR: 8280
I included the fix to unblock this PR because otherwise I am not able to test my changes, execution would fail earlier in the frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain the bug introduced and how this fixes it? Adding @zahiraam to review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline for the 2 tests that you were looking at, I didn't see the need for this fix. Not quite sure what this is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I've rechecked the tests and indeed this change is redundant. Removed this change from PR. Thanks.
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
@@ -513,12 +514,17 @@ static CallInst *CreateBuiltinCallWithAttr(CodeGenFunction &CGF, StringRef Name, | |||
// TODO: Replace AttrList with a single attribute. The call can only have a | |||
// single FPAccuracy attribute. | |||
llvm::AttributeList AttrList; | |||
// "sycl_used_aspects" metadata associated with the call. | |||
SmallVector<llvm::Metadata *, 4> AspectsMD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a vector? I am not sure if we can multiple such MD associated with a single call. Please note the TODO associated with the AttrList above.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this, thank you!
@@ -1846,8 +1847,18 @@ static llvm::fp::FPAccuracy convertFPAccuracy(StringRef FPAccuracyStr) { | |||
.Case("cuda", llvm::fp::FPAccuracy::CUDA); | |||
} | |||
|
|||
static int32_t convertFPAccuracyToAspect(StringRef FPAccuracyStr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add an assert here to ensure this function is called with appropriate FPAccuracyStr?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added assert, thanks.
clang/lib/CodeGen/CGCall.cpp
Outdated
@@ -1864,6 +1875,9 @@ void CodeGenModule::getDefaultFunctionFPAccuracyAttributes( | |||
ID, FuncType, convertFPAccuracy(FuncMapIt->second)); | |||
assert(!FPAccuracyVal.empty() && "A valid accuracy value is expected"); | |||
FuncAttrs.addAttribute("fpbuiltin-max-error=", FPAccuracyVal); | |||
if (getLangOpts().SYCLIsDevice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check here?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need this check because sycl_used_aspects metadata is redundant if it's not SYCL device code.
@@ -0,0 +1,138 @@ | |||
// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin -fno-math-errno -fsycl -fsycl-device-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to test if the aspects got correctly propagated to the calling functions?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test to check propagation, thanks!
// CHECK: [[ASPECT1]] = !{i32 -1} | ||
// CHECK: [[ASPECT2]] = !{i32 -2} | ||
// CHECK: [[ASPECT3]] = !{i32 -3} | ||
// CHECK: [[ASPECT4]] = !{i32 -5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be rename these to match aspect name and the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added few high-level questions and some code comments. please address.
Thanks
clang/lib/CodeGen/CGCall.cpp
Outdated
if (!getLangOpts().FPAccuracyFuncMap.empty() || | ||
!getLangOpts().FPAccuracyVal.empty()) { | ||
if ((!getLangOpts().FPAccuracyFuncMap.empty() || | ||
!getLangOpts().FPAccuracyVal.empty())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why double parentheses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, fixed.
@@ -0,0 +1,102 @@ | |||
// RUN: %clang_cc1 -fsycl-is-device -ffp-builtin-accuracy=high:sin,sqrt -ffp-builtin-accuracy=medium:cos -ffp-builtin-accuracy=low:tan -ffp-builtin-accuracy=cuda:exp,acos -ffp-builtin-accuracy=sycl:log,asin -emit-llvm -triple spir64-unknown-unknown -disable-llvm-passes %s -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a run line where a TU value of accuracy is used and a another one with a mix of TU accuracy and function specific ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added additional run lines according to your suggestion.
@@ -0,0 +1,138 @@ | |||
// RUN: %clangxx %s -o %test.bc -ffp-accuracy=high:sin,sqrt -ffp-accuracy=medium:cos -ffp-accuracy=low:tan -ffp-accuracy=cuda:exp,acos -ffp-accuracy=sycl:log,asin -fno-math-errno -fsycl -fsycl-device-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same request here for additional RUN lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added additional run lines according to your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
I'm confused here: why it is possible to propagate Suggested approach does looks like a hack, a bit and I think that we should have some generic infrastructure for propagating attributes/metadata, which are not aspects. There are a few examples already where it is needed: in I don't want to block this PR, but I would like to ensure that there is a path to refactor accuracy levels handling from aspects to generic attributes/metadata propagation. |
In this case attribute attached to the intrinsic call looks like this: attributes #3 = { "fpbuiltin-max-error="="1.0f" } |
Hello @Fznamznon, could you please help to review this PR. Originally @elizabethandrews was looking at this from intel/dpcpp-cfe-reviewers group. But unfortunately she is on vacation and other members are on vacation as well. Formally @zahiraam is from the frontend team but not in the dpcpp-cfe-reviewers group, so I am not sure if I can treat her approval as green light from CFE group. |
HIP AMDGPU failure is an infrastructure issue which is unrelated to this PR. |
Thanks, I see. I need to take a more detailed look at the PR, I've missed that mapping part.
It is also my bad for not reviewing this earlier. As I said, I won't block the PR and I'm perfectly fine with doing some unification later. For example, that joint matrix effort has not yet started: we could think of some infrastructure in scope of that effort and once it is done, refactor what we can to use that infrastructure. |
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt); | ||
.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt) | ||
.Default(0); | ||
if (!FPAccuracyIntrinsicID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of creates else after return so I agree about move.
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
@@ -22144,7 +22150,8 @@ llvm::CallInst *CodeGenFunction::EmitFPBuiltinIndirectCall( | |||
// Even if the current function doesn't have a clang builtin, create | |||
// an 'fpbuiltin-max-error' attribute for it; unless it's marked with | |||
// an NoBuiltin attribute. | |||
if (!FD->hasAttr<NoBuiltinAttr>()) { | |||
if (!FD->hasAttr<NoBuiltinAttr>() && | |||
FD->getNameInfo().getName().isIdentifier()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is a function name not an identifier?
If it's a CXXContructorDecl then it's not an identifier.
Understood but
Can you explain why this change is required?
I think this question is still not quite answered.
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt); | ||
.Case("rsqrt", llvm::Intrinsic::fpbuiltin_rsqrt) | ||
.Default(0); | ||
if (!FPAccuracyIntrinsicID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the next else-after-return sequence comes from code added in intel/llvm repo, I would also appreciate a slight refactoring since you're here.
This PR reuses optional kernel features mechanism to split device image
based on accuracy level provided using -ffp-accuracy compilation option (introduced in PR#8280):
attribute we also attach "sycl_used_aspects" metadata to the call
instruction with a value which corresponds to high, medium, low, sycl
or cuda. Mapping for those values is needed to be visible for SYCL
device compiler only and we intentionally don't put those values to
aspects enum because we don't need these aspects to be visible to
the user in this case (because of the reasons described in Details below).
metadata from instructions to kernel.
don't need processing of these internal aspects (with negative values) in the SYCL RT.
Splitting functionality based on sycl_used_aspects metadata is available for free.
Details:
Currently accruracy level can be controlled using the following options.
For entire translation unit:
-ffp-accuracy=high
-ffp-accuracy=medium
-ffp-accuracy=low
-ffp-accuracy=sycl
-ffp-accuracy=cuda
For particular funcions in the translation unit:
-ffp-accuracy=low:sin,cos
Whenever frontend sees a math function in a kernel or a device function
it emits fp intrinsic call with attached callsite attribute indicating
value of the maximum error. llvm-spirv is going to translate this
builtins to regular __ocl intrinsics and translate callsite attribute to
decorator (which is a new spirv extension). If that extension is not supported
by the backend, it is going to emit an error. Error is emitted also in
the case if backend supports the extension but can't compile the kernel because
it doesn't have corresponding implemenation of math function complying with
required maximum error.
Aspects corrsponding to different levels of accuracy are not suitable in
this case because aforementioned options are sycl program compilation options, i.e.
it doesn't make sense to provide an opportunity to the user to write
something like this:
if (dev.has(aspect::ext_oneapi_fp_intrinsic_accuracy_high)) {
/* submit kernel using high accuracy intrinsics */
}
But on our side we still would like to put kernels and device functions
to different images based on required accuracy level. It is necessary because
some backends may support, for example, low and medium accuracy but don't
support high accuracy.