-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LTO] Run Argument Promotion before IPSCCP #111163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.
Due to the fact that Fortran passes most arguments by reference [1], Flang lowers constant arguments passed to function/subroutines as allocas + stores. We currently fail to propagate such constants during LTO in some cases, significantly impacting performance of some key benchmarks (e.g. SPEC CPU2017 bwaves). An FIR pass was added in #73829 to fix this, but is disabled by default due to portability concerns. The core of the issue is that these constant values passed as pointers to local objects (allocas), which are promoted to scalar constants by Argument Promotion, are not able to be propagated by IPSCCP due to this running prior to Argument Promotion in the LTO pipeline. This patch aims to address this by running Argument Promotion before IPSCCP in the LTO pipeline. We are then able to propagate such constants successfully. This gives a performance improvement of 20.1% in SPEC CPU2017 bwaves. This particular benchmark example is illustrative of the broader scope for improving Fortran code generation; for example Fortran array descriptors frequently generate structs filled with constants passed as pointers to functions, which are currently not propagated inter-procedurally in many cases at present. The compile time cost of this change, measured by change in instruction counts for compiling CTMark:
An impact of this change is that some function arguments that would previously have been specialised by Function Specialization, which is currently only enabled for pointer arguments by default, are no longer specialised due to being promoted to literal constants. To mitigate this, PR #111162 enables specialising constant literal arguments by default for recursive functions. This has negligible impact on CTMark compile times (Geomean 0.00%) for both LTO and non-LTO pipelines. |
would be good to run this through https://llvm-compile-time-tracker.com/ have you considered moving IPSCCP later in the LTO pipeline? |
Agreed - my compile-time results are based on the methodology of llvm-compile-time-tracker, but it would be useful to get the full results. @nikic would you be able to do a run of this commit / add my fork https://github.com/hazzlim/llvm-project/ so that I can do so?
It was considered - based on the discussion on #96620 it seemed as though this would not be an acceptable approach, but perhaps considerations might be different in the full LTO pipeline? |
I tested this a while ago but didn't post here: https://llvm-compile-time-tracker.com/compare.php?from=bf488ed6e1fbe4c494a1dc0dd199a3d03405784e&to=a4fe1ec912f4a2479218fc2d30ccbc8cf82cf782&stat=instructions:u It's indeed surprisingly cheap to run these extra passes. A question I have here if this problem is somehow specific to LTO, or whether you just happen to be optimizing for LTO? |
Thanks for sharing these results, they seem mostly in line with what I have been seeing so that's good! We just happened to be concentrating on improving the LTO pipeline here - I think that this would also be beneficial to non-LTO, provided the compile-time impact wasn't prohibitive there either. Would it be preferred to make pipeline changes like this to LTO and non-LTO in separate PRs, or all in one go? |
It looks like it would be more expensive to do this non-LTO. The results I got for CTMark at O3 (non-LTO) on AArch64:
(For reference, this is the change I tested #111672) |
@ronlieb Did you intend to merge this PR? |
Reverts #111163, as this was merged prematurely.
no i did not intend to merge that, |
thanks for reverting it |
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.
Reverts llvm#111163, as this was merged prematurely.
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas. Relands #111163.
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.