Skip to content

[SYCL] Reimplemented -f[no]sycl-early-optimizations flag #7701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jan 6, 2023

Conversation

andylshort
Copy link

Reimplemented the -f[no]sycl-early-optimizations compiler flag to separate it from the meaning of -disable-llvm-passes for more fidelity. This required a change to its definition, setting of a new codegen option behind-the-scenes, and small logic changes to the optimization pipeline to factor in the new flag. Existing tests all still pass.

Lamzed-Short, Andrew added 6 commits November 28, 2022 07:54
sycl-early-opts is default behaviour, so not required. The no- variant has
been reimplemented and separated from disable-llvm-passes also.

Rearranged some of the SYCL passes in codegen pipeline but further work
needed to resolve 7 failing test cases.
The boolean logic I had initially replaced with for was wrong, but is correct now.
Positive version was removed when it shouldn't have been, so reinstated it.
sycl-device-optimizations test now passes again.
NOTE: One of them is spelt wrong? 'perserve' instead of 'preserve'?
When adding a CodegenOpt, MarshallingInfoFlag is the secret sauce to relate
the CLI flag to the boolean option.
@andylshort andylshort requested review from a team as code owners December 8, 2022 13:38
@andylshort andylshort requested a review from a team December 8, 2022 13:38
@premanandrao
Copy link
Contributor

@andylshort, what is the motivation for this split? What problem is it trying to solve?

@andylshort
Copy link
Author

what is the motivation for this split? What problem is it trying to solve?

@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918

@AlexeySachkov
Copy link
Contributor

what is the motivation for this split? What problem is it trying to solve?

@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918

Yeah, this change is an internal refactoring to resolve some technical debt we have: we re-used -disable-llvm-passes in -fno-sycl-early-optimizations implementation, but we discarded the original point of -disable-llvm-passes, i.e. we started scheduling passes even if this option is set, contrary to its intent.

@bader
Copy link
Contributor

bader commented Dec 9, 2022

what is the motivation for this split? What problem is it trying to solve?

@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918

Yeah, this change is an internal refactoring to resolve some technical debt we have: we re-used -disable-llvm-passes in -fno-sycl-early-optimizations implementation, but we discarded the original point of -disable-llvm-passes, i.e. we started scheduling passes even if this option is set, contrary to its intent.

  1. Doesn't O0 do what we want? Can't we just remove this flag?
  2. What is the difference in passes before and after the patch?

@AlexeySachkov
Copy link
Contributor

@bader

  1. Doesn't O0 do what we want?

-O0 inserts optnone attribute, which prevents further optimizations down the stack (i.e. in JIT/AOT device compilers). Early optimizations is a special mode, which was intended to reduce the size of device code produced by our compiler and it only consists of some device-agnostic optimizations. For example, most of loop transformations and vectorizations are removed from that optimization pipeline, because they require some information about target device to be profitable.

Can't we just remove this flag?

I think that we can remove it from user-visible options, but we still need it internally. We disable early optimizations when we AOT-compile for FPGA target: this is done, because most of optimizations for FPGA which are done without knowledge of a target device will only hurt performance and code size (which is important for FPGAs)

  1. What is the difference in passes before and after the patch?

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Adding the MarshallingInfoFlag to the previous definition also works,
so I've reverted the def back and added the flag, as opposed to splitting.
@bader
Copy link
Contributor

bader commented Dec 10, 2022

@bader

  1. Doesn't O0 do what we want?

-O0 inserts optnone attribute, which prevents further optimizations down the stack (i.e. in JIT/AOT device compilers). Early optimizations is a special mode, which was intended to reduce the size of device code produced by our compiler and it only consists of some device-agnostic optimizations. For example, most of loop transformations and vectorizations are removed from that optimization pipeline, because they require some information about target device to be profitable.

I was considering O0 as a replacement for -fno-sycl-early-optimizations, rather than -fsycl-early-optimizations. O, where N >= 1 should be enough enable the optimizations, you refer to. I think there is an option to drop optnone with O0. Right?

Can't we just remove this flag?

I think that we can remove it from user-visible options, but we still need it internally. We disable early optimizations when we AOT-compile for FPGA target: this is done, because most of optimizations for FPGA which are done without knowledge of a target device will only hurt performance and code size (which is important for FPGAs)

Let's remove the driver option. For FPGA target, can't we use "O0 w/o optnone" existing combination of flags?

@AlexeySachkov
Copy link
Contributor

I think there is an option to drop optnone with O0. Right?

Right, -disable-O0-optnone if I'm not mistaken.

I was considering O0 as a replacement for -fno-sycl-early-optimizations, rather than -fsycl-early-optimizations. O, where N >= 1 should be enough enable the optimizations, you refer to.

Right now -O0 is being recorded into device image so the runtime later passes -cl-opt-disable to JIT compiler. Similar thing happens with AOT flow, I suppose. Do I read it correctly that we don't want users to be able to select different optimization options for FE & backend compilations? Or rather we don't have a specialized flag for that, but want them to use -Xsycl-target-frontend and -Xsycl-target-backend, right?

Note: I would be against removing public version of -f[no-]sycl-early-optimizations immediately, because it may break workflow of our users. I suggest that if we intent to remove it, then we deprecate it first and remove later.

For FPGA target, can't we use "O0 w/o optnone" existing combination of flags?

Do we want to preserve an ability to enable early optimizations for FPGA target? If not, then we can indeed just always pass -O0 -disable-O0-optnone there and remove the flag even internally.

Lamzed-Short, Andrew added 4 commits December 13, 2022 03:48
The code gen option gets declared and set in the flag definition so its
definition here was superfluous.
All of the SYCL passes now are no longer run when `-disable-llvm-passes`
is set, so we now properly honour the flag in this case.
Several passes also run in under correct circumstances now.

This has caused the following tests to fail:
- CodeGenSYCL/group-local-memory.cpp
- SemaSYCL/sycl-force-inline-kernel-lambda.cpp
- CodeGenSYCL/device_has.cpp
- CodeGenSYCL/sub-group-size.cpp
- CodeGenSYCL/uses_aspects.cpp

These all be addressed in the next few commits, or in some cases in new
PRs due to scope and needing discussion, etc.
This test now completely honours the `-disable-llvm-passes` flag now
that it has been separated from the `-fno-sycl-early-optimizations` flag and logic.
The change in the disable-llvm-passes required this test to be rolled back
due to the change in logic between the `-disable-llvm-passes` flag and the
reimplemented `-fno-sycl-early-optimizations` flag.
@bader
Copy link
Contributor

bader commented Dec 13, 2022

I think we already have means to disable "early" optimizations and we don't need a user facing specific option for that.
This option was added to analyze performance regressions caused by "early" optimizations, but I think we don't need it anymore.
Let's go with "deprecate" -> "remove" path.

Lamzed-Short, Andrew added 2 commits December 14, 2022 08:37
SyclPropagateAspectsUsagePass enable dnow even if `-fno-sycl-early-optimizations`
flag is provided.
The -fsycl-force-inline-kernel-lambda flag adds the inlining attribute to the
AST node of the kernel, so it was necessary to make this test more specific.

The attributes aren't put on the FunctionDecl node of each class template argument
to parallel_for, but rather on the actual LambdaExpr subtree it calls.

Maybe some additional CHECKs to link the LambdaExpr and FunctionDecl of
the kernel would be good future work, as the kernel FunctionalDecl aren't in
the same location as the Lambda in which they're called. There is a reference
linking the two together, but as far as I can tell, no functionality to link these
together, as the curent CHECK functions seem to regex match individual lines?
@elizabethandrews
Copy link
Contributor

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

@AlexeySachkov
Copy link
Contributor

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

-disable-llvm-passes is an internal flag, which is intended to disable all passes. It does not guarantee any functional correctness of the program and should only be used for debugging/experiments/etc.

Lamzed-Short, Andrew added 4 commits January 3, 2023 04:06
This flag can get set automatically by the definition in Options.td to
the same effect and no test failures, so changing this to be more concise.
@andylshort andylshort temporarily deployed to aws January 3, 2023 14:54 — with GitHub Actions Inactive
@andylshort andylshort temporarily deployed to aws January 3, 2023 15:24 — with GitHub Actions Inactive
@premanandrao
Copy link
Contributor

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

I think the critical difference here is that this is an internal option only and thus should be used only in specific circumstances. I agree that it is confusing, but now that I understand the changes a bit more, I think the confusion is mostly because of the false expectations that was set before about what -disable-llvm-passes meant.

@andylshort andylshort temporarily deployed to aws January 4, 2023 15:48 — with GitHub Actions Inactive
@andylshort andylshort temporarily deployed to aws January 4, 2023 16:18 — with GitHub Actions Inactive
@elizabethandrews
Copy link
Contributor

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

-disable-llvm-passes is an internal flag, which is intended to disable all passes. It does not guarantee any functional correctness of the program and should only be used for debugging/experiments/etc.

Ok. Thank you for clarifying

@bader bader requested a review from elizabethandrews January 4, 2023 18:49
The previous test didn't check with and without the
`-fno-sycl-force-inline-kernel-lambda` properly and was quite loose.

The test checks, run lines, and form have been fixed with assistance. Thanks!
@andylshort andylshort temporarily deployed to aws January 5, 2023 14:26 — with GitHub Actions Inactive
@andylshort andylshort temporarily deployed to aws January 5, 2023 14:56 — with GitHub Actions Inactive
@andylshort
Copy link
Author

andylshort commented Jan 5, 2023

Current test failure on AMD GPU is a separate issue and will be fixed once this PR intel/llvm-test-suite#1487 is merged.

@bader bader merged commit d164fd9 into intel:sycl Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants