Skip to content

Workaround for MSVC ARM64 build performance regression #65215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2023
Merged

Workaround for MSVC ARM64 build performance regression #65215

merged 1 commit into from
Oct 20, 2023

Conversation

lxbndr
Copy link
Contributor

@lxbndr lxbndr commented Sep 2, 2023

MSVC has a major performance regression observed when targeting ARM64 since v19.32 (VS 17.2.0). cl.exe spends a lot of time on compiling StandardLibrary.cpp and CGBuiltin.cpp, and total build duration rises extremely. This makes builds stagnate even on a real hardware, but VM-based builds (like building on cloud agents from GitHub Actions and Azure Pipelines) are experiencing most damage as they also performance- and time-limited.

The issue appears to be related to some optimizations applied in /O2 mode. It is reported on Developer Community. While the investigation is in progress, we could apply a workaround to improve build time. /O2 actually enables a set of optimizations, and only one of them does all slowdown. The idea is to disable optimizations, and then apply all but one back, effectively excluding the problematic option from the set.

This patch alters the CMake configuration for aforementioned files. Changes are limited to:

  • non-debug builds
  • MSVC of the specific version
  • target arch (ARM64).

@lxbndr lxbndr requested a review from a team as a code owner September 2, 2023 22:28
Copy link
Collaborator

@tru tru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I think we at least need to make sure that we don't erase flags that have been set by the user manually.

Another option would just to say that this configuration ARM64 + MSVC of this version is not a valid configuration for MSVC and add it to ProblematicConfigurations here: https://github.com/llvm/llvm-project/blob/main/llvm/cmake/modules/CheckProblematicConfigurations.cmake

AND MSVC
AND MSVC_VERSION VERSION_GREATER_EQUAL 1932
AND CMAKE_SYSTEM_PROCESSOR MATCHES "ARM64")
set_source_files_properties(CGBuiltin.cpp PROPERTIES COMPILE_FLAGS "/Od /Gw /Oi /Oy /Gy /Ob2 /Ot /GF")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are setting these settings for all configurations and erasing the other flags set for these files. I don't think that's the right solution. I would rather you grab the current compile_flags and then substitute the optimization flags only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I think I have to elaborate a little (and I apologize for not making this clear enough in the PR description). We're not going to just throw a bunch of optimization flags, of course.

The flag I actually intend to alter is /O2. According to the reference, /O2 is the equivalent of /Og /Oi /Ot /Oy /Ob2 /GF /Gy (I even see this in the compiler frontend invocation). I want to suppress the /Og and leave others in effect. This is done by adding /Od first (which cancels all /O2 flags), and then adding "O2-minus-Og" flags back. Probably, the /Gw flag is the only unnecessary flag here, as it is not part of the /O... set (and it is not affected by /Od either).

It is also worth to mention that set_source_files_properties does not override flag set. It appends specified flags to other configured flags. That's why I had to use /Od to cancel already existing /O2.

I fully understand the intention to not break existing configuration. And I think I see how this can be improved. Instead of assuming all non-debug builds have optimizations enabled, we should seek for /O2//O1 flags directly, and adjust resulting flag set according to each case. @tru WDYT? Does this sound better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is what we want. We don't want to assume what the current flags are. Does it only happen in Release and RelWithDebInfo configurations? In that case you can maybe use generator expressions to only happen for those configurations?

Btw I noticed that COMPILE_FLAGS is deprecated and COMPILE_OPTIONS should be used instead, so maybe look into that as well?

@@ -30,6 +30,15 @@ set(LLVM_LINK_COMPONENTS
TransformUtils
)

# Workaround for MSVC ARM64 performance regression: disable all optimizations (/Od)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment also should link to the issue on the Microsoft side so that we can know to remove this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure if the link is acceptable to mention there. Will do, thanks!

@tru tru requested a review from a team September 3, 2023 09:18
@tru tru added cmake Build system in general and CMake in particular platform:windows labels Sep 3, 2023
@omjavaid
Copy link
Contributor

omjavaid commented Sep 4, 2023

Hi Have you compared build performance between cl and clang-cl? If only cl.exe is taking a lot of time while clang-cl doesnt then may be we should make this change specific to cl.exe.

@tru
Copy link
Collaborator

tru commented Sep 4, 2023

I suspect maybe the MSVC_VERSION check might exclude clang-cl, but we should make sure, because @omjavaid is correct, we shouldn't apply this to clang-cl unless it shows the same problems.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Sep 18, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2023

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Changes

MSVC has a major performance regression observed when targeting ARM64 since v19.32 (VS 17.2.0). cl.exe spends a lot of time on compiling StandardLibrary.cpp and CGBuiltin.cpp, and total build duration rises extremely. This makes builds stagnate even on a real hardware, but VM-based builds (like building on cloud agents from GitHub Actions and Azure Pipelines) are experiencing most damage as they also performance- and time-limited.

The issue appears to be related to some optimizations applied in /O2 mode. It is reported on Developer Community. While the investigation is in progress, we could apply a workaround to improve build time. /O2 actually enables a set of optimizations, and only one of them does all slowdown. The idea is to disable optimizations, and then apply all but one back, effectively excluding the problematic option from the set.

This patch alters the CMake configuration for aforementioned files. Changes are limited to:

  • non-debug builds
  • MSVC of the specific version
  • target arch (ARM64).

Full diff: https://github.com/llvm/llvm-project/pull/65215.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CMakeLists.txt (+20)
  • (modified) clang/lib/Tooling/Inclusions/Stdlib/CMakeLists.txt (+20)
diff --git a/clang/lib/CodeGen/CMakeLists.txt b/clang/lib/CodeGen/CMakeLists.txt
index 1debeb6d9cce9e0..2dd93a57f4e22d2 100644
--- a/clang/lib/CodeGen/CMakeLists.txt
+++ b/clang/lib/CodeGen/CMakeLists.txt
@@ -30,6 +30,26 @@ set(LLVM_LINK_COMPONENTS
   TransformUtils
   )
 
+# Workaround for MSVC ARM64 performance regression:
+# https://developercommunity.visualstudio.com/t/Compiling-a-specific-code-for-ARM64-with/10444970
+# Since /O1 and /O2 represent a set of optimizations,
+# our goal is to disable the /Og flag while retaining the other optimizations from the /O1|/O2 set
+if(MSVC AND NOT CMAKE_CXX_COMPILER_ID MATCHES Clang
+    AND MSVC_VERSION VERSION_GREATER_EQUAL 1932
+    AND CMAKE_SYSTEM_PROCESSOR MATCHES "ARM64")
+
+  string(TOUPPER "${CMAKE_BUILD_TYPE}" uppercase_CMAKE_BUILD_TYPE)
+  string(REGEX MATCHALL "/[Oo][12]" opt_flags "${CMAKE_CXX_FLAGS} ${CMAKE_CXX_FLAGS_${uppercase_CMAKE_BUILD_TYPE}}")
+  if (opt_flags)
+    if(opt_flags MATCHES "1$")
+      set(opt_flags "/Od;/Os;/Oy;/Ob2;/GF;/Gy")
+    elseif (opt_flags MATCHES "2$")
+      set(opt_flags "/Od;/Oi;/Ot;/Oy;/Ob2;/GF;/Gy")
+    endif()
+    set_source_files_properties(CGBuiltin.cpp PROPERTIES COMPILE_OPTIONS "${opt_flags}")
+  endif()
+endif()
+
 add_clang_library(clangCodeGen
   ABIInfo.cpp
   ABIInfoImpl.cpp
diff --git a/clang/lib/Tooling/Inclusions/Stdlib/CMakeLists.txt b/clang/lib/Tooling/Inclusions/Stdlib/CMakeLists.txt
index 0f52c3590ac9af1..ed323ab3528b104 100644
--- a/clang/lib/Tooling/Inclusions/Stdlib/CMakeLists.txt
+++ b/clang/lib/Tooling/Inclusions/Stdlib/CMakeLists.txt
@@ -1,3 +1,23 @@
+# Workaround for MSVC ARM64 performance regression:
+# https://developercommunity.visualstudio.com/t/Compiling-a-specific-code-for-ARM64-with/10444970
+# Since /O1 and /O2 represent a set of optimizations,
+# our goal is to disable the /Og flag while retaining the other optimizations from the /O1|/O2 set
+if(MSVC AND NOT CMAKE_CXX_COMPILER_ID MATCHES Clang
+    AND MSVC_VERSION VERSION_GREATER_EQUAL 1932
+    AND CMAKE_SYSTEM_PROCESSOR MATCHES "ARM64")
+
+  string(TOUPPER "${CMAKE_BUILD_TYPE}" uppercase_CMAKE_BUILD_TYPE)
+  string(REGEX MATCHALL "/[Oo][12]" opt_flags "${CMAKE_CXX_FLAGS} ${CMAKE_CXX_FLAGS_${uppercase_CMAKE_BUILD_TYPE}}")
+  if (opt_flags)
+    if(opt_flags MATCHES "1$")
+      set(opt_flags "/Od;/Os;/Oy;/Ob2;/GF;/Gy")
+    elseif (opt_flags MATCHES "2$")
+      set(opt_flags "/Od;/Oi;/Ot;/Oy;/Ob2;/GF;/Gy")
+    endif()
+    set_source_files_properties(StandardLibrary.cpp PROPERTIES COMPILE_OPTIONS "${opt_flags}")
+  endif()
+endif()
+
 add_clang_library(clangToolingInclusionsStdlib
   StandardLibrary.cpp
 

@lxbndr
Copy link
Contributor Author

lxbndr commented Sep 18, 2023

A short summary of changes:

  • added link to the issue
  • narrowed checks to aim cl.exe only (clang-cl.exe doesn't need this)
  • changed approach a bit. Now we're looking for /O2 and /O1 flags in the general flag set for the current configuration. Last effective found flag is suppressed and replaced with "same-minus-/Og" equivalent for corresponding O variant (either size or speed).

I believe this is the most careful way to adjust the build options and avoid conflicts with user flags. Basically, if you have /O_n_, you will get same optimizations except problematic one.

cc @tru @omjavaid

@lxbndr lxbndr requested a review from tru September 18, 2023 21:53
Copy link
Collaborator

@tru tru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lxbndr
Copy link
Contributor Author

lxbndr commented Sep 29, 2023

@tru I apologize for any inconvenience, but should I rebase this to the latest master, or something else to unblock landing? I am not sure if I see any suitable rule in the contribution guide. Just in case it occasionally turns out that the blocker is me.

@xgupta
Copy link
Contributor

xgupta commented Oct 20, 2023

@tru Is it fine to commit this?

@tru
Copy link
Collaborator

tru commented Oct 20, 2023

Yep. Want me to do it or do you have access now?

@xgupta
Copy link
Contributor

xgupta commented Oct 20, 2023

Yes, I have the access.

@xgupta xgupta merged commit c6f0f88 into llvm:main Oct 20, 2023
@lxbndr lxbndr deleted the msvc-slow-arm64 branch October 20, 2023 20:17
MaxEW707 pushed a commit that referenced this pull request Jun 25, 2024
)

This is follow up for #65215

Mentioned regression was fixed in MSVC 19.39 (VS 17.9.0), so it makes
sense to not apply fix for that (and newer) compiler versions.

Same as original change, this patch is narrowly scoped to not affect any
other compiler.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…m#90731)

This is follow up for llvm#65215

Mentioned regression was fixed in MSVC 19.39 (VS 17.9.0), so it makes
sense to not apply fix for that (and newer) compiler versions.

Same as original change, this patch is narrowly scoped to not affect any
other compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category cmake Build system in general and CMake in particular platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants