-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Reimplemented -f[no]sycl-early-optimizations flag #7701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Reimplemented -f[no]sycl-early-optimizations flag #7701
Conversation
sycl-early-opts is default behaviour, so not required. The no- variant has been reimplemented and separated from disable-llvm-passes also. Rearranged some of the SYCL passes in codegen pipeline but further work needed to resolve 7 failing test cases.
The boolean logic I had initially replaced with for was wrong, but is correct now.
Positive version was removed when it shouldn't have been, so reinstated it. sycl-device-optimizations test now passes again.
NOTE: One of them is spelt wrong? 'perserve' instead of 'preserve'?
When adding a CodegenOpt, MarshallingInfoFlag is the secret sauce to relate the CLI flag to the boolean option.
@andylshort, what is the motivation for this split? What problem is it trying to solve? |
@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918 |
Yeah, this change is an internal refactoring to resolve some technical debt we have: we re-used |
|
I think that we can remove it from user-visible options, but we still need it internally. We disable early optimizations when we AOT-compile for FPGA target: this is done, because most of optimizations for FPGA which are done without knowledge of a target device will only hurt performance and code size (which is important for FPGAs)
We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if |
Adding the MarshallingInfoFlag to the previous definition also works, so I've reverted the def back and added the flag, as opposed to splitting.
I was considering O0 as a replacement for
Let's remove the driver option. For FPGA target, can't we use "O0 w/o optnone" existing combination of flags? |
Right,
Right now Note: I would be against removing public version of
Do we want to preserve an ability to enable early optimizations for FPGA target? If not, then we can indeed just always pass |
The code gen option gets declared and set in the flag definition so its definition here was superfluous.
All of the SYCL passes now are no longer run when `-disable-llvm-passes` is set, so we now properly honour the flag in this case. Several passes also run in under correct circumstances now. This has caused the following tests to fail: - CodeGenSYCL/group-local-memory.cpp - SemaSYCL/sycl-force-inline-kernel-lambda.cpp - CodeGenSYCL/device_has.cpp - CodeGenSYCL/sub-group-size.cpp - CodeGenSYCL/uses_aspects.cpp These all be addressed in the next few commits, or in some cases in new PRs due to scope and needing discussion, etc.
This test now completely honours the `-disable-llvm-passes` flag now that it has been separated from the `-fno-sycl-early-optimizations` flag and logic.
The change in the disable-llvm-passes required this test to be rolled back due to the change in logic between the `-disable-llvm-passes` flag and the reimplemented `-fno-sycl-early-optimizations` flag.
I think we already have means to disable "early" optimizations and we don't need a user facing specific option for that. |
SyclPropagateAspectsUsagePass enable dnow even if `-fno-sycl-early-optimizations` flag is provided.
The -fsycl-force-inline-kernel-lambda flag adds the inlining attribute to the AST node of the kernel, so it was necessary to make this test more specific. The attributes aren't put on the FunctionDecl node of each class template argument to parallel_for, but rather on the actual LambdaExpr subtree it calls. Maybe some additional CHECKs to link the LambdaExpr and FunctionDecl of the kernel would be good future work, as the kernel FunctionalDecl aren't in the same location as the Lambda in which they're called. There is a reference linking the two together, but as far as I can tell, no functionality to link these together, as the curent CHECK functions seem to regex match individual lines?
Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in |
|
This flag can get set automatically by the definition in Options.td to the same effect and no test failures, so changing this to be more concise.
I think the critical difference here is that this is an internal option only and thus should be used only in specific circumstances. I agree that it is confusing, but now that I understand the changes a bit more, I think the confusion is mostly because of the false expectations that was set before about what -disable-llvm-passes meant. |
Ok. Thank you for clarifying |
The previous test didn't check with and without the `-fno-sycl-force-inline-kernel-lambda` properly and was quite loose. The test checks, run lines, and form have been fixed with assistance. Thanks!
Current test failure on AMD GPU is a separate issue and will be fixed once this PR intel/llvm-test-suite#1487 is merged. |
Reimplemented the
-f[no]sycl-early-optimizations
compiler flag to separate it from the meaning of-disable-llvm-passes
for more fidelity. This required a change to its definition, setting of a new codegen option behind-the-scenes, and small logic changes to the optimization pipeline to factor in the new flag. Existing tests all still pass.