[SYCL] Reimplemented -f[no]sycl-early-optimizations flag #7701

andylshort · 2022-12-08T13:38:29Z

Reimplemented the -f[no]sycl-early-optimizations compiler flag to separate it from the meaning of -disable-llvm-passes for more fidelity. This required a change to its definition, setting of a new codegen option behind-the-scenes, and small logic changes to the optimization pipeline to factor in the new flag. Existing tests all still pass.

sycl-early-opts is default behaviour, so not required. The no- variant has been reimplemented and separated from disable-llvm-passes also. Rearranged some of the SYCL passes in codegen pipeline but further work needed to resolve 7 failing test cases.

The boolean logic I had initially replaced with for was wrong, but is correct now.

Positive version was removed when it shouldn't have been, so reinstated it. sycl-device-optimizations test now passes again.

NOTE: One of them is spelt wrong? 'perserve' instead of 'preserve'?

When adding a CodegenOpt, MarshallingInfoFlag is the secret sauce to relate the CLI flag to the boolean option.

clang/include/clang/Driver/Options.td

clang/lib/CodeGen/BackendUtil.cpp

premanandrao · 2022-12-08T16:06:14Z

@andylshort, what is the motivation for this split? What problem is it trying to solve?

andylshort · 2022-12-08T16:18:43Z

what is the motivation for this split? What problem is it trying to solve?

@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918

AlexeySachkov · 2022-12-08T16:40:41Z

what is the motivation for this split? What problem is it trying to solve?

@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918

Yeah, this change is an internal refactoring to resolve some technical debt we have: we re-used -disable-llvm-passes in -fno-sycl-early-optimizations implementation, but we discarded the original point of -disable-llvm-passes, i.e. we started scheduling passes even if this option is set, contrary to its intent.

bader · 2022-12-09T01:04:59Z

what is the motivation for this split? What problem is it trying to solve?

@premanandrao The immediate motivation for this change was for the implementation of #7527 to be unblocked. It requires the two flags to be separate for its logic. It was also referenced in a FIXME: https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/BackendUtil.cpp#L918

Yeah, this change is an internal refactoring to resolve some technical debt we have: we re-used -disable-llvm-passes in -fno-sycl-early-optimizations implementation, but we discarded the original point of -disable-llvm-passes, i.e. we started scheduling passes even if this option is set, contrary to its intent.

Doesn't O0 do what we want? Can't we just remove this flag?
What is the difference in passes before and after the patch?

AlexeySachkov · 2022-12-09T09:27:11Z

@bader

Doesn't O0 do what we want?

-O0 inserts optnone attribute, which prevents further optimizations down the stack (i.e. in JIT/AOT device compilers). Early optimizations is a special mode, which was intended to reduce the size of device code produced by our compiler and it only consists of some device-agnostic optimizations. For example, most of loop transformations and vectorizations are removed from that optimization pipeline, because they require some information about target device to be profitable.

Can't we just remove this flag?

I think that we can remove it from user-visible options, but we still need it internally. We disable early optimizations when we AOT-compile for FPGA target: this is done, because most of optimizations for FPGA which are done without knowledge of a target device will only hurt performance and code size (which is important for FPGAs)

What is the difference in passes before and after the patch?

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Adding the MarshallingInfoFlag to the previous definition also works, so I've reverted the def back and added the flag, as opposed to splitting.

clang/include/clang/Driver/Options.td

clang/lib/CodeGen/BackendUtil.cpp

clang/lib/Frontend/CompilerInvocation.cpp

bader · 2022-12-10T01:02:03Z

@bader

Doesn't O0 do what we want?

-O0 inserts optnone attribute, which prevents further optimizations down the stack (i.e. in JIT/AOT device compilers). Early optimizations is a special mode, which was intended to reduce the size of device code produced by our compiler and it only consists of some device-agnostic optimizations. For example, most of loop transformations and vectorizations are removed from that optimization pipeline, because they require some information about target device to be profitable.

I was considering O0 as a replacement for -fno-sycl-early-optimizations, rather than -fsycl-early-optimizations. O, where N >= 1 should be enough enable the optimizations, you refer to. I think there is an option to drop optnone with O0. Right?

Can't we just remove this flag?

I think that we can remove it from user-visible options, but we still need it internally. We disable early optimizations when we AOT-compile for FPGA target: this is done, because most of optimizations for FPGA which are done without knowledge of a target device will only hurt performance and code size (which is important for FPGAs)

Let's remove the driver option. For FPGA target, can't we use "O0 w/o optnone" existing combination of flags?

AlexeySachkov · 2022-12-12T10:51:37Z

I think there is an option to drop optnone with O0. Right?

Right, -disable-O0-optnone if I'm not mistaken.

I was considering O0 as a replacement for -fno-sycl-early-optimizations, rather than -fsycl-early-optimizations. O, where N >= 1 should be enough enable the optimizations, you refer to.

Right now -O0 is being recorded into device image so the runtime later passes -cl-opt-disable to JIT compiler. Similar thing happens with AOT flow, I suppose. Do I read it correctly that we don't want users to be able to select different optimization options for FE & backend compilations? Or rather we don't have a specialized flag for that, but want them to use -Xsycl-target-frontend and -Xsycl-target-backend, right?

Note: I would be against removing public version of -f[no-]sycl-early-optimizations immediately, because it may break workflow of our users. I suggest that if we intent to remove it, then we deprecate it first and remove later.

For FPGA target, can't we use "O0 w/o optnone" existing combination of flags?

Do we want to preserve an ability to enable early optimizations for FPGA target? If not, then we can indeed just always pass -O0 -disable-O0-optnone there and remove the flag even internally.

The code gen option gets declared and set in the flag definition so its definition here was superfluous.

All of the SYCL passes now are no longer run when `-disable-llvm-passes` is set, so we now properly honour the flag in this case. Several passes also run in under correct circumstances now. This has caused the following tests to fail: - CodeGenSYCL/group-local-memory.cpp - SemaSYCL/sycl-force-inline-kernel-lambda.cpp - CodeGenSYCL/device_has.cpp - CodeGenSYCL/sub-group-size.cpp - CodeGenSYCL/uses_aspects.cpp These all be addressed in the next few commits, or in some cases in new PRs due to scope and needing discussion, etc.

This test now completely honours the `-disable-llvm-passes` flag now that it has been separated from the `-fno-sycl-early-optimizations` flag and logic.

The change in the disable-llvm-passes required this test to be rolled back due to the change in logic between the `-disable-llvm-passes` flag and the reimplemented `-fno-sycl-early-optimizations` flag.

bader · 2022-12-13T20:16:54Z

I think we already have means to disable "early" optimizations and we don't need a user facing specific option for that.
This option was added to analyze performance regressions caused by "early" optimizations, but I think we don't need it anymore.
Let's go with "deprecate" -> "remove" path.

clang/test/CodeGenSYCL/sub-group-size.cpp

SyclPropagateAspectsUsagePass enable dnow even if `-fno-sycl-early-optimizations` flag is provided.

The -fsycl-force-inline-kernel-lambda flag adds the inlining attribute to the AST node of the kernel, so it was necessary to make this test more specific. The attributes aren't put on the FunctionDecl node of each class template argument to parallel_for, but rather on the actual LambdaExpr subtree it calls. Maybe some additional CHECKs to link the LambdaExpr and FunctionDecl of the kernel would be good future work, as the kernel FunctionalDecl aren't in the same location as the Lambda in which they're called. There is a reference linking the two together, but as far as I can tell, no functionality to link these together, as the curent CHECK functions seem to regex match individual lines?

clang/lib/CodeGen/BackendUtil.cpp

elizabethandrews · 2023-01-03T06:30:07Z

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

clang/lib/CodeGen/BackendUtil.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/lib/CodeGen/BackendUtil.cpp

clang/test/CodeGenSYCL/device_has.cpp

clang/test/CodeGenSYCL/uses_aspects.cpp

clang/test/SemaSYCL/sycl-force-inline-kernel-lambda-ast.cpp

AlexeySachkov · 2023-01-03T09:24:58Z

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

-disable-llvm-passes is an internal flag, which is intended to disable all passes. It does not guarantee any functional correctness of the program and should only be used for debugging/experiments/etc.

This flag can get set automatically by the definition in Options.td to the same effect and no test failures, so changing this to be more concise.

premanandrao · 2023-01-04T15:07:26Z

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

I think the critical difference here is that this is an internal option only and thus should be used only in specific circumstances. I agree that it is confusing, but now that I understand the changes a bit more, I think the confusion is mostly because of the false expectations that was set before about what -disable-llvm-passes meant.

clang/test/SemaSYCL/sycl-force-inline-kernel-lambda-ast.cpp

elizabethandrews · 2023-01-04T16:34:08Z

We expect to see no difference in the early optimizations pipeline with this patch. The only observable difference would be that if -disable-llvm-passes flag is present, no passes will be launched regardless of sycl early optimizations command line switch value; even functional passes will be skipped.

Won't the absence of functional passes in -disable-llvm-passes result in incorrect compiler behavior? I am not entirely familiar with how LLVM passes are usually handled but this feels incorrect to me. Disabling optimizations shouldn't change functionality like aspect propagation right? @premanandrao please weigh in

-disable-llvm-passes is an internal flag, which is intended to disable all passes. It does not guarantee any functional correctness of the program and should only be used for debugging/experiments/etc.

Ok. Thank you for clarifying

The previous test didn't check with and without the `-fno-sycl-force-inline-kernel-lambda` properly and was quite loose. The test checks, run lines, and form have been fixed with assistance. Thanks!

andylshort · 2023-01-05T15:03:19Z

Current test failure on AMD GPU is a separate issue and will be fixed once this PR intel/llvm-test-suite#1487 is merged.

Lamzed-Short, Andrew added 6 commits November 28, 2022 07:54

Fixed faulty flag assignment

8546bec

The boolean logic I had initially replaced with for was wrong, but is correct now.

Reverted change to sycl-early-opts flag

78c9589

Positive version was removed when it shouldn't have been, so reinstated it. sycl-device-optimizations test now passes again.

Reverted two tests with unnecessary changes

d4ae056

NOTE: One of them is spelt wrong? 'perserve' instead of 'preserve'?

Fixed the flag definition and tidied up the logic

dd3445e

When adding a CodegenOpt, MarshallingInfoFlag is the secret sauce to relate the CLI flag to the boolean option.

clang-format'd BackendUtil changes

2416d7d

andylshort requested review from a team as code owners December 8, 2022 13:38

andylshort requested a review from a team December 8, 2022 13:38

Lamzed-Short, Andrew added 2 commits December 8, 2022 05:45

Merge branch 'sycl' into alamzeds/fsycl-early-optimizations-flag

63c4430

Resolved failing clang-format issues

9c8bbb5

AlexeySachkov requested changes Dec 8, 2022

View reviewed changes

Changed sycl-early-opts flag back to prev definition w/ change

bc085fd

Adding the MarshallingInfoFlag to the previous definition also works, so I've reverted the def back and added the flag, as opposed to splitting.

elizabethandrews reviewed Dec 9, 2022

View reviewed changes

Lamzed-Short, Andrew added 4 commits December 13, 2022 03:48

DisableLLVMPasses flag now handled by marshalling infrastructure

95b0d3c

The code gen option gets declared and set in the flag definition so its definition here was superfluous.

Update to group-local-memory test to honour disable-llvm-passes

b13fa37

This test now completely honours the `-disable-llvm-passes` flag now that it has been separated from the `-fno-sycl-early-optimizations` flag and logic.

Rolled back uses_aspect test now it adheres to -disable-llvm-passes

af1324b

The change in the disable-llvm-passes required this test to be rolled back due to the change in logic between the `-disable-llvm-passes` flag and the reimplemented `-fno-sycl-early-optimizations` flag.

sub-group-size test change with updated flags

5f45075

andylshort commented Dec 14, 2022

View reviewed changes

clang/test/CodeGenSYCL/sub-group-size.cpp Outdated Show resolved Hide resolved

Lamzed-Short, Andrew added 2 commits December 14, 2022 08:37

Fixed functional pass invocation logic

63270a6

SyclPropagateAspectsUsagePass enable dnow even if `-fno-sycl-early-optimizations` flag is provided.

AlexeySachkov reviewed Dec 20, 2022

View reviewed changes

clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved

AlexeySachkov requested a review from mdtoguchi December 20, 2022 15:55

Tidied up and formatted pipeline if logic

5d39281

AlexeySachkov approved these changes Dec 20, 2022

View reviewed changes

AlexeySachkov requested a review from elizabethandrews December 20, 2022 16:22

mdtoguchi approved these changes Dec 20, 2022

View reviewed changes

elizabethandrews reviewed Jan 3, 2023

View reviewed changes

Lamzed-Short, Andrew added 4 commits January 3, 2023 04:06

Let DisableSYCLEarlyOpts codegen opt be set by marshalling

2c90ec7

This flag can get set automatically by the definition in Options.td to the same effect and no test failures, so changing this to be more concise.

Refactor to consolidate logic and clean up code paths

786f8c0

Merge branch 'sycl' into alamzeds/fsycl-early-optimizations-flag

34c9d24

Formatted changes

c4c5274

andylshort temporarily deployed to aws January 3, 2023 14:54 — with GitHub Actions Inactive

andylshort temporarily deployed to aws January 3, 2023 15:24 — with GitHub Actions Inactive

Merge branch 'sycl' into alamzeds/fsycl-early-optimizations-flag

c2adfd0

elizabethandrews reviewed Jan 4, 2023

View reviewed changes

clang/test/SemaSYCL/sycl-force-inline-kernel-lambda-ast.cpp Outdated Show resolved Hide resolved

clang/test/SemaSYCL/sycl-force-inline-kernel-lambda-ast.cpp Show resolved Hide resolved

clang/test/SemaSYCL/sycl-force-inline-kernel-lambda-ast.cpp Show resolved Hide resolved

andylshort temporarily deployed to aws January 4, 2023 15:48 — with GitHub Actions Inactive

andylshort temporarily deployed to aws January 4, 2023 16:18 — with GitHub Actions Inactive

bader requested a review from elizabethandrews January 4, 2023 18:49

Updated force inline kernel lambda test

9871f46

The previous test didn't check with and without the `-fno-sycl-force-inline-kernel-lambda` properly and was quite loose. The test checks, run lines, and form have been fixed with assistance. Thanks!

andylshort temporarily deployed to aws January 5, 2023 14:26 — with GitHub Actions Inactive

andylshort temporarily deployed to aws January 5, 2023 14:56 — with GitHub Actions Inactive

elizabethandrews approved these changes Jan 5, 2023

View reviewed changes

smanna12 approved these changes Jan 5, 2023

View reviewed changes

bader merged commit d164fd9 into intel:sycl Jan 6, 2023

cperkinsintel mentioned this pull request Jan 10, 2023

test ci 01 intel/llvm-test-suite#1501

Closed

[SYCL] Reimplemented -f[no]sycl-early-optimizations flag #7701

[SYCL] Reimplemented -f[no]sycl-early-optimizations flag #7701

Uh oh!

Conversation

andylshort commented Dec 8, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

premanandrao commented Dec 8, 2022

Uh oh!

andylshort commented Dec 8, 2022

Uh oh!

AlexeySachkov commented Dec 8, 2022

Uh oh!

bader commented Dec 9, 2022

Uh oh!

AlexeySachkov commented Dec 9, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bader commented Dec 10, 2022

Uh oh!

AlexeySachkov commented Dec 12, 2022

Uh oh!

bader commented Dec 13, 2022

Uh oh!

Uh oh!

Uh oh!

elizabethandrews commented Jan 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Jan 3, 2023

Uh oh!

premanandrao commented Jan 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elizabethandrews commented Jan 4, 2023

Uh oh!

andylshort commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

andylshort commented Jan 5, 2023 •

edited

Loading