Skip to content

Disable compiling and testing Flang on Clang changes #92740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/generate-buildkite-pipeline-premerge
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ function compute-projects-to-test() {
done
;;
clang)
for p in clang-tools-extra compiler-rt flang lldb cross-project-tests; do
for p in clang-tools-extra compiler-rt lldb cross-project-tests; do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you also need to remove this from the llvm and mlir switch cases? Also, we are we doing this here and not in the exclude-windows function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is not to disable Flang on Windows, but stop testing it on Clang changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we differentiate windows and linux here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need to? Flang depends on clangDriver (which depends on clangBasic), so the only time Flang tests need to run in response to Clang changes should be limited to changes to those two parts of the project. Otherwise, we're spending time (and money) testing Flang on changes that have no impact to their project (like changes to clangSema, etc).

Copy link
Collaborator

@joker-eph joker-eph May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the same for a lot of dependencies that this file encodes.

The principle here is conservative: if a change to clang might affect another project, then it should be tested.
Basically take the finest grain dependency in your head, then collapse the dependency graph per top-level projects.
"clangDriver" and "clangBasic" aren't projects, right now we can only track dependencies at this granularity (patches welcome of course, but that seems hard to keep maintained).

Note the situation is exactly the same for MLIR: Flang uses a small part of MLIR, many dialects are never used by Flang yet we encode the dependency at the granularity of the project.

(the granularity goes both ways also: many tests in Flang don't depend on the clang driver, yet we're still running them because it's really hard to disentangle when the project isn't setup specifically with this in mind)

Why would we need to?

Because you're trying to "not test" something that should be tested, only for the sake of a problem on windows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point out where this is encoded in this file? I only see Flang specified as a dependency of MLIR (not the other way round).

This is what I mean: Flang is always test for all changes in MLIR, even when such change affect a subset of MLIR that Flang does not use. This is the analog situation with Clang where a change to Clang always trigger testing flang even when it touches the subset of Clang that can't impact Flang.

This approach is very expensive in practice.

"Pre-merge testing is very expensive", yes...
What can we say about changes to LLVM which triggers testing the entire monorepo (almost), even though a change to the PowerPC backend cannot possible fail a test in MLIR for example (unless you run the CI on a power-PC machine)? (and likely similarly can't fail a LLDB test, I'm not sure).

But what are you arguing about here exactly? What is the core principle to use for testing the monorepo in a premerge context?
Are you saying that there should be no dependency between the top-level projects at all for example?

I think this trivializes the current situation -- this is not "for the sake of a problem on windows", this is for the sake of the entire community that has been raising concerns with how unusable precommit CI has been for the past ~six months.

Sorry but you're again equating what I believe is a "windows problem" with "precommit CI" as a whole:

  1. claim of "unstable" are dubious to me: this is vague and unspecified
  2. The ci is very backed-up, and that's because Windows testing is slow. Someone proposed to remove windows from the pre-commit testing but I think you were the one opposing this.

What I see on PRs on my side is that it always ends up with the Linux build completing while Windows waits for an agent for hours. Hence "it's a windows problem".

So we're gonna be on the opposite side of who is trivializing he situation here!

Copy link
Collaborator

@joker-eph joker-eph May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not forget how we got to this PR: https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591

Which is 1) Flang only and 2) Windows specific. Pasting what I wrote there:

MLIR tests ran in 13.13s on Linux and 34.81s on Windows, so 2.65x slower.
Flang tests ran in 5.98s on Linux and 1220.35s on Windows, so >200x slower.

There is a dramatic issue with the Flang tests on Windows. This PR won't solve the problem by the way: I see it entirely as arbitrary to remove flang from the clang dependency and not from anything else (Flang tests will continue to run and occupy a lot of the CI).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what are you arguing about here exactly? What is the core principle to use for testing the monorepo in a premerge context?

I am arguing two things. 1) to solve immediate needs, we need precommit CI to go back to a working state, which it has not been in since the transition to GitHub. This involves disabling tests. 2) to solve long-term needs, we need to rethink how we handle precommit CI because the current strategy of testing everything on every change is expensive and nonfunctional in practice. This patch is about #1 and I would like to see it land ASAP so we can debate the finer points of #2 while the community is not experiencing significant pain.

What I see on PRs on my side is that it always ends up with the Linux build completing while Windows waits for an agent for hours. Hence "it's a windows problem".

I see the same outcomes you're seeing, but I'm coming from the perspective of seeing how that impacts PRs in general. What I'm seeing, at least in Clang, is that folks are not landing PRs until precommit CI goes green, and so patches have to undergo significantly more churn because of how slow precommit CI is to complete. The problem is that precommit CI is not finishing in a reasonable amount of time. That's not a Windows problem, that's a community problem caused by precommit CI and exacerbated by one part of the project having slow tests on Windows. Claiming it is a Windows problem makes it sound like the issue is with the OS when that's not actually (known to be) the case.

@joker-eph -- I'd like to make sure we're on the same page; this PR has been accepted and is set to be landed sometime soon to ease pain for the community. If you wish to block the PR from landing, please hit Request Changes and explain what you want to see done in order to move the PR forward in a timely manner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a Windows problem

I'm confused, nothing you wrote is a rebuttal of what I wrote...
Are you just arguing about terminology here? What if we replace every instance of "Windows problem" with "Testing the monorepo on Windows"?

Claiming it is a Windows problem makes it sound like the issue is with the OS when that's not actually (known to be) the case.

It's not productive to try to finger point the "fault" to either the OS or the tests themselves or the infra or....
The problem that led us here is "Flang testing is hanging on the Windows CI agent for 20min unnecessarily".
The CI problem is specific to our testing on Windows?

If you wish to block the PR from landing, please hit Request Changes and explain what you want to see done in order to move the PR forward in a timely manner.

I shouldn't have to hit "request change": there is an open discussion here that should be resolved.
I asked here why aren't we just disabling this on Windows right now and I didn't get a convincing answer so far, I don't understand.

So I you want to move forward fast here is a proposal: split this PR in two, right now disable Flang dependency on windows only, and open an other PR to disable the Flang dependency on Linux as well, we can discuss there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, this is a 'perfect is the enemy of better' situation. @joker-eph is absolutely right: in a perfect world, we'd run all of the tests that could be affected at a fine grain control.

At the moment however, we have a COARSE grain control, that has lead to pre-commit CI being low-value at high cost. We have 3 main failures as I see it:

1- Pre-commit CI is taking so long that it is delaying patches, causing us to have a slower velocity and lower quality product as a result.

2- Pre-commit CI is taking so long that it is causing folks to IGNORE CI and commit without it. This is absolutely a necessity to combat 1, but makes the valuable parts of CI significantly less valuable.

3- Pre-commit CI is costing as much as 2x dollar-wise than it optimially should (as these flang tests have elsewhere shown to take up >50% of our execution time), thanks to running Flang tests unnecessarily.

ALL OF THIS, is in exchange for running some additional testing that has, in my experiences, found zero issues. So at the moment, the cost/value proposition is a divide-by-zero (or at least, is approaching infinity if I managed to miss the ONE case these tests saved). While the main motivation for us is 1 and 2, 3 very much needs attention as well.

I believe that accepting this patch as-is improves the quality, cost, and usefulness of CI and is a 'net win'. The 'cost' that makes this NOT a 'complete-win' is minor, and of low value. Add in the fact that this is a dependency that will eventually go away thanks to the Flang effort, I don't see the advantage.

In a PERFECT world, we could stick with the PERFECT situation: testing every time. But, testing takes time and money, which is a much higher cost than we're getting out of this, for what is a temporary measure.

As a maintainer on the clang project, we need to accept this patch to regain the value of CI.

echo $p
done
;;
Expand Down
Loading