Skip to content

[CI] Clean up runtimes builds #131913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 21, 2025

Conversation

boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Mar 18, 2025

This patch cleans up the runtimes build in premerge due to queuing delays, dropping the C++03 testing, but keeping the C++20 and Modules configurations as they are deemed important by clang contributors.

This patch also makes it easier in the future when we need to rework the runtimes build to anticipate the deprecation of building most of the runtimes with LLVM_ENABLE_PROJECTS.

This patch makes the monolithic-linux script use a single runtimes build
rather than building in three (C++03, C++26, and modules) different
configurations. Queueing delays are quite high currently. This is
probably due to a couple of reasons, but from what I've observed,
building all these runtimes in all the configurations takes up a decent
chunk of the overall build time (~10 minutes). These are configurations
that I really think should be tested post-commit. Some things might slip
through the cracks, but I believe there would be relatively few patches
doing this. Given the tradeoffs with build times and the (or at least
my) view that premerge should be testing the most common configuration
for these projects only, I think it makes sense to only use one.

This patch also makes it easier in the future when we need to rework the
runtimes build to anticipate the deprecation of building most of the
runtimes with LLVM_ENABLE_PROJECTS.
@Endilll
Copy link
Contributor

Endilll commented Mar 18, 2025

I do think that this patch is useful for reducing the load on pre-merge CI, so I'm not going to object.

This is probably due to a couple of reasons

Is there an understanding what caused such slowdowns? One reason I can see is diverting a chunk of resources towards BuiltKite-to-GitHub transition, but could there be more?

building all these runtimes in all the configurations takes up a decent chunk of the overall build time (~10 minutes)

10 minutes out of how much?

These are configurations that I really think should be tested post-commit

I can agree with that regarding C++03 build, but as C++20 modules are (slowly) adopted, Clang modules build becomes increasingly more relevant, because Clang modules shares a lot of code paths with C++20 modules.

This patch also makes it easier in the future when we need to rework the runtimes build to anticipate the deprecation of building most of the runtimes with LLVM_ENABLE_PROJECTS.

I don't follow. Even of runtimes can be built today via LLVM_ENABLE_PROJECTS, which I'm not sure about, the code you remove relies on LLVM_ENABLE_RUNTIMES.

@boomanaiden154
Copy link
Contributor Author

Is there an understanding what caused such slowdowns? One reason I can see is diverting a chunk of resources towards BuiltKite-to-GitHub transition, but could there be more?

The resources aren't getting diverted. We're running both systems in parallel. Buldkite has the same number of resources as before.

10 minutes out of how much?

Typically 30, so approximately 30% of the build/test time.

I can agree with that regarding C++03 build, but as C++20 modules are (slowly) adopted, Clang modules build becomes increasingly more relevant, because Clang modules shares a lot of code paths with C++20 modules.

They share code paths, sure. But they're a fairly atypical combination. I can understand wanting to test it in premerge CI, but we need to have balance we cannot test everything, and we want to provide results quickly. To me, anything that requires building a project in more than one configuration shifts the balance strongly towards exhaustive testing, which I do not think we should be doing in premerge CI.

I don't follow. Even of runtimes can be built today via LLVM_ENABLE_PROJECTS, which I'm not sure about, the code you remove relies on LLVM_ENABLE_RUNTIMES.

The ability to build runtimes with LLVM_ENABLE_PROJECTS is going to be deprecated (see #124009). This means we have to move all the runtimes we currently test to the runtimes build. Trying to support multiple runtime build configurations with a bunch of projects that probably don't care about those features (libc, compiler-rt, etc.) adds complexity, although is doable.

It's hard to strike a balance against exhaustive testing and keeping premerge fast. If clang developers have evidence that this catches a significant number of issues, I think we can look at reenabling it just for builds of clang. With how things are setup right now though, touching MLIR, LLVM, Flang, or Clang will cause all of these runtimes to be rebuilt, which isn't the most tractable in terms of build scalability.

@AaronBallman
Copy link
Collaborator

CC @mizvekov for additional opinions

@mizvekov
Copy link
Contributor

Yeah, I think keeping at least the modules build would be important, as they are quite under-tested in the clang test suite, and the libc++ modules build often catches issues in that area, including serialization.

@boomanaiden154
Copy link
Contributor Author

Yeah, I think keeping at least the modules build would be important, as they are quite under-tested in the clang test suite, and the libc++ modules build often catches issues in that area, including serialization.

What about only running these configurations after changes to clang?

@Meinersbur
Copy link
Member

Meinersbur commented Mar 20, 2025

libc and libcxx already have their own pre-merge tests. They use stock compilers (gcc and clang), so the additional value here is only whether Clang is regressing. If worker load is at capacity, I would even argue there is not need to bootstap-build libcxx/libc at all, for the same reason a 2-stage build of Clang is probably not worth it: A littlle more coverage, but whether can build in principle has already been verified. Specific configurations can be checked by the buildbots. So this PR looks good to me, but I am not the person to decide how Google spends its resources.

Note that this is different for Flang-RT: It is designed to only work with a build of Flang from the same revision, so one cannot use a pre-compiled/stock flang.

@Endilll
Copy link
Contributor

Endilll commented Mar 20, 2025

The resources aren't getting diverted. We're running both systems in parallel. Buldkite has the same number of resources as before.

Good to know, but it seems that we run things twice: once via buildkite, once via github. Is that accurate?

10 minutes out of how much?

Typically 30, so approximately 30% of the build/test time.

I can agree with that regarding C++03 build, but as C++20 modules are (slowly) adopted, Clang modules build becomes increasingly more relevant, because Clang modules shares a lot of code paths with C++20 modules.

They share code paths, sure. But they're a fairly atypical combination. I can understand wanting to test it in premerge CI, but we need to have balance we cannot test everything, and we want to provide results quickly. To me, anything that requires building a project in more than one configuration shifts the balance strongly towards exhaustive testing, which I do not think we should be doing in premerge CI.

This was discussed during Clang WG meeting yesterday. There were no objections to cut C++03 configuration, but C++26 and Clang Modules configurations were deemed important. Can we start with removing C++03?

Also, in the current state this PR removes C++03, C++26, and Modules configurations, leaving C++17 instead. It's not clear to me why it has to be this way. It's as if we don't care about anything from C++20 on, because those parts of libc++ will be disabled in C++17 mode.

I don't follow. Even of runtimes can be built today via LLVM_ENABLE_PROJECTS, which I'm not sure about, the code you remove relies on LLVM_ENABLE_RUNTIMES.

The ability to build runtimes with LLVM_ENABLE_PROJECTS is going to be deprecated (see #124009). This means we have to move all the runtimes we currently test to the runtimes build. Trying to support multiple runtime build configurations with a bunch of projects that probably don't care about those features (libc, compiler-rt, etc.) adds complexity, although is doable.

Yes, I'm aware that specifying libcxx and other runtimes in LLVM_ENABLE_PROJECT is deprecated, and that makes sense. However, the part I'm not following is how this patch help with any of that. CMake invocations that you remove don't use LLVM_ENABLE_PROJECTS at all, so they are not going to be affected even after this deprecated feature will be removed.

It's hard to strike a balance against exhaustive testing and keeping premerge fast. If clang developers have evidence that this catches a significant number of issues, I think we can look at reenabling it just for builds of clang.

I've seen libc++ C++26 build catching errors last year, when I was working on this part of pre-commit CI. This is of course an anecdotal evidence, but I'd be surprised if anyone has more than that for any project in the monorepo. Above Matheus mentioned that Clang Modules build, too, catches issues for him, and given how little contributors we have in this area, testing this part of Clang becomes even more important to prevent bitrot. Also, it seems that lately modules-related bug reports are files almost on a daily basis: https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20label%3Aclang%3Amodules

With how things are setup right now though, touching MLIR, LLVM, Flang, or Clang will cause all of these runtimes to be rebuilt, which isn't the most tractable in terms of build scalability.

function compute-runtimes-to-test() {
projects=${@}
for project in ${projects}; do
case ${project} in
clang)
for p in libcxx libcxxabi libunwind; do
echo $p
done
;;
*)
# Nothing to do
;;
esac
done
}

Looking at the code above, I don't understand why libc++ and friends are built for MLIR, LLVM, and Flang. If that's the case, then I agree that it doesn't bring a significant amount of additional coverage, and can be disabled.

libc and libcxx already have their own pre-merge tests. They use stock compilers (gcc and clang), so the additional value here is only whether Clang is regressing. If worker load is at capacity, I would even argue there is not need to bootstap-build libcxx/libc at all, for the same reason a 2-stage build of Clang is probably not worth it: A littlle more coverage, but whether can build in principle has already been verified.

I think that "whether Clang is regressing" is a significant value, given how many regressions we're seeing: https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20label%3Aclang%2Cclang%3Afrontend%2Cclang%3APCH%2Cclang%3Amodules%2Cclang%3Adiagnostics%2Cclang%3Adriver%2Cclang%3Acodegen%2Cclang%3Aheaders%2Cclang%3Aanalysis%20label%3Aregression%2Cregression%3A20%2Cregression%3A19%2Cregression%3A18%2Cregression%3A17%2Cregression%3A16.

@tstellar
Copy link
Collaborator

This was discussed during Clang WG meeting yesterday. There were no objections to cut C++03 configuration, but C++26 and Clang Modules configurations were deemed important. Can we start with removing C++03?

This seems like a reasonable first step to me. If we are still having queue delays maybe we need to bring the discussion to Discourse and have a wider discussion about testing priorities. We have similar issues with long-queuing delays for the free macOS runners, but we don't have a good process to determine which workloads to prioritize.

@mizvekov
Copy link
Contributor

I'd love if we could keep a way to pre-commit test these other configurations optionally, off-by-default, by for example requiring the user to add certain tags to the PR.

@boomanaiden154
Copy link
Contributor Author

Good to know, but it seems that we run things twice: once via buildkite, once via github. Is that accurate?

Yes. But we've just temporarily doubled the total capacity that we have in GCP, so there is no regression in performance. The plan eventually is to try and roll the quota from the old system into the new system though.

This was discussed during Clang WG meeting yesterday. There were no objections to cut C++03 configuration, but C++26 and Clang Modules configurations were deemed important. Can we start with removing C++03?

That sounds reasonable enough to me. I can modify this patch to do that.

Yes, I'm aware that specifying libcxx and other runtimes in LLVM_ENABLE_PROJECT is deprecated, and that makes sense. However, the part I'm not following is how this patch help with any of that. CMake invocations that you remove don't use LLVM_ENABLE_PROJECTS at all, so they are not going to be affected even after this deprecated feature will be removed.

The current system builds most other projects using the projects build. For example, libc is currently built using LLVM_ENABLE_PROJECTS, but that configuration will eventually get deprecated. Switching to a single runtimes build, or at least understanding what is important to test will make that migration simpler.

I've seen libc++ C++26 build catching errors last year, when I was working on this part of pre-commit CI. This is of course an anecdotal evidence, but I'd be surprised if anyone has more than that for any project in the monorepo. Above Matheus mentioned that Clang Modules build, too, catches issues for him, and given how little contributors we have in this area, testing this part of Clang becomes even more important to prevent bitrot. Also, it seems that lately modules-related bug reports are files almost on a daily basis: https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20label%3Aclang%3Amodules

Yeah, I don't think there is hard data on these sorts of things, and they're hard to track unless people go through and manually review all the failures which is a huge time stink. Anecdotes are perfectly fine. This is good to know.

Looking at the code above, I don't understand why libc++ and friends are built for MLIR, LLVM, and Flang. If that's the case, then I agree that it doesn't bring a significant amount of additional coverage, and can be disabled.

I reverted the patch that was doing that as it seemed to also be helping to push things over the edge, at least for MLIR and Flang. Changing LLVM will still cause these to be built. When I get to reworking the dependency system, I will make sure to adjust the runtimes builds to only test these configurations when testing clang, which I think should give us the best balance.

@boomanaiden154
Copy link
Contributor Author

I'd love if we could keep a way to pre-commit test these other configurations optionally, off-by-default, by for example requiring the user to add certain tags to the PR.

It should be technically possible. The main thing is exchanging build artifacts between the runs so that we don't have to recompile too many things. Given the runtimes builds can be separate and we only need to package clang between jobs, I think it should be doable, it's just a matter of actually doing it/prioritization

@boomanaiden154 boomanaiden154 changed the title [CI] Switch to a single runtimes build [CI] Clean up runtimes builds Mar 21, 2025
@Endilll
Copy link
Contributor

Endilll commented Mar 21, 2025

As a follow-up for this patch, I'll make Clang Modules build reuse the build directory after C++26 build, because they only differ in LIBCXX_TEST_PARAMS (and LIBCXXABI_TEST_PARAMS). I just checked locally that we don't need a clean build to make this change propagate. This should save us couple more minutes per build (out of typical 30 minutes).

@boomanaiden154
Copy link
Contributor Author

As a follow-up for this patch, I'll make Clang Modules build reuse the build directory after C++26 build, because they only differ in LIBCXX_TEST_PARAMS (and LIBCXXABI_TEST_PARAMS). I just checked locally that we don't need a clean build to make this change propagate. This should save us couple more minutes per build (out of typical 30 minutes).

That would be great. I didn't realize we could do that.

@boomanaiden154 boomanaiden154 merged commit 052a4b5 into llvm:main Mar 21, 2025
9 checks passed
@boomanaiden154 boomanaiden154 deleted the one-runtimes-build-premerge branch March 21, 2025 19:39
@Meinersbur
Copy link
Member

libc and libcxx already have their own pre-merge tests. They use stock compilers (gcc and clang), so the additional value here is only whether Clang is regressing. If worker load is at capacity, I would even argue there is not need to bootstap-build libcxx/libc at all, for the same reason a 2-stage build of Clang is probably not worth it: A littlle more coverage, but whether can build in principle has already been verified.

I think that "whether Clang is regressing" is a significant value, given how many regressions we're seeing: llvm/llvm-project/issues (state:open label:clang,clang:frontend,clang:PCH,clang:modules,clang:diagnostics,clang:driver,clang:codegen,clang:headers,clang:analysis label:regression,regression:20,regression:19,regression:18,regression:17,regression:16).

That's not the point. There is still the buildbot CI infrastructure that would hopefully catch these regressions. If we have infinite resources, we should pre-merge check everything. But since the point of this PR is that there is not, a tradeoff must be made. I am arguing that having each source file compiled in at least one configuration would be reasonable, every additional configuration has diminishing returns.

If I could wish for another configuration that I would like to test, it would be BUILD_SHARED_LIBS=ON. Builds fail regularly because a dependency has been forgotten, causing a subsequent fix/revert. The point of the Pre-Merge over Post-Commit CI seems to avoid exactly this churn, not that otherwise these would not be caught. There are literally premerge-monolithic-linux/premerge-monolithic-windows post-build configurations that should catch these.

Endilll added a commit that referenced this pull request Mar 25, 2025
…es (#132480)

Between C++26 and Clang modules build of runtimes, which are used as an
additional testing for Clang, the only difference are
`LIBCXX_TEST_PARAMS` and `LIBCXXABI_TEST_PARAMS`. Both of them are
transformed into actual lit configuration lines, and put into
`SERIALIZED_LIT_PARAMS`, which end up in `libcxx/test/cmake-bridge.cfg`
via `configure_file` command. Notably, it seems that they are not used
in any other way.

I checked that if those variables are changed, subsequent runs of CMake
configuration step regenerate `cmake-bridge.cfg` with the new values.
Which means that we don't need to do clean builds for every runtimes
configuration we want to test.

I hope that together with #131913, this will be enough to alleviate
Linux CI pains we're having, and we wouldn't have to make a tough choice
between C++26 and Clang modules builds for pre-merge CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants