-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Don't produce binlogs in CI #21478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't produce binlogs in CI #21478
Conversation
@wtgodbe One ugly option would be to leave |
Ah, darn. Sounds like we can't opt out of |
Looks like you could set the |
@JohnTortugo we don't use the eng/common/build.* scripts anywhere we're hitting the Short term workaround is to remove the error at https://github.com/dotnet/aspnetcore/blob/master/eng/common/tools.ps1#L608-L611, remove the |
Do you invoke |
This may help. But, I've found doing that breaks some functional tests because global properties (anything specified on the command line) can't be overridden. I've also been looking at using Other big opportunities include work @mmitche is doing on improving build performance in general and work I'm doing on using Roslyn to generate ref/ assemblies. That second bit allows us to remove all of the ref/ projects, simplifying the project dependency graph significantly. |
So we are planning to take #21487 instead? |
No. However, the second commit there may be something we want to merge into Arcade and our CI builds. |
https://dev.azure.com/dnceng/internal/_build/results?buildId=630359&view=results also failed w/ an OOM |
It's likely other folks rely on the behavior that passing |
@wtgodbe that sounds good to me. |
I'll send a PR to arcade |
I don't understand why this would be. The current implementation of I'd prefer this approach over adding a new switch. This has also the benefit of making the binlog easier to analyze when you're not interested in restore phase (and vice versa). Some folks have asked for this. |
As I said and at least for dotnet/aspnetcore, this works unless we also pass |
Should also mention we've had guidance for ages about Arcade providing reasonable defaults and conventions but allowing repos to opt out. Why are you pushing back on enabling this particular opt-out @tmat❔ |
Because I believe there is an alternative solution that addresses the issue, does not require another switch and also addresses another feedback that we have received.
It would be good to check if this is actually true and why. Since the output of restore is stored in artifacts\obj files that are input to build it seems odd that that should be a problem. But I might be missing something. |
A much more complicated solution that would maintain the hacks we have in our build.ps1 and build.sh scripts to enable us to use |
I agree with Doug that it's easier/less disruptive to add an opt-out switch to Arcade than to continue to compound hacks in this repo to work around the lack of one - @markwilkie @chcosta what do you think? |
Adding a new switch to build.ps1 is adding new public API to workaround a bug in msbuild. I don't think that's a good approach. |
@tmat even if we didn't want to disable producing binary logs in this repo, the new switch enables us removing hacks like this: # Workaround Arcade check which asserts BinaryLog is true on CI.
# We always use binlogs on CI, but we customize the name of the log file
$tmpBinaryLog = $BinaryLog
if ($CI) {
$BinaryLog = $true
} |
What is the reason why you want to customize the name of the binlog? I am aware that people wanted separate binlogs for restore and build phases. If build.ps1 does that by default (separate binlogs) then wouldn't that remove the need to custom name the binlogs and also remove the OOM issue? |
BTW, I think I heard someone mentioning that we might want to do analysis/telemetry on binlogs produced by our builds. If repos use custom names for their binlogs such analysis tool won't be able to find the right logs. |
Many of our builds require multiple steps e.g. to handle projects that must be built after the shared framework like our installers. We're not going to hack together such things in our We also use distinct (per-platform) names for our binary logs across our CI jobs to make it easy to download multiple logs without creating meaningless |
OK. Well, then I guess different repos will just have different binlogs and we won't be able to do analysis on them. Oh well. |
Why can't the analysis just run on anything that ends with In any case, it's clear you're very opposed to this - we need to be able to disable the creation of binlogs in CI while we wait for dotnet/msbuild#3579 to be fixed, but I don't see any way to do that without modifying our local |
I'm not gonna block this change as I understand the need to workaround the bug. I just said I don't like it as it seems very ad-hoc to me. As I suggested on the other thread we could call msbuild twice to separate restore and build logs.
We can but it might result in more complexity. Some repos might build more things in addition to the main restore/build and have more logs there. Depending on what the analysis is doing it might need to somehow figure out which log is supposed to have what. Maybe not, IDK. Also, from a perspective of someone looking frequently at builds of many different repos, I prefer the log names to follow some common convention - sure, there can be other logs specific to the repo but it's useful when I can immediately see which is the main build log. The same reasoning as having common anything. |
The value of a shared infra diminishes everytime one of us doesn't actually share. If we start renaming and/or putting binlogs in different places, it means that we've diminished our ability to "grok" our builds. My hope is that we're normalizing - not the opposite. Also....OOM's are a thing, and doing unnatural acts to keep binlogs normal is also wonky. Question: could we turn this into a point-in-time situation where we acknowledge what the "right thing" is? For example, is there further build refactoring that is goodness, and also takes care of the binlog issue? |
Some of this could presumably be solved by further improvements to our project files - the binlog size was recently reduced by ~30% by a change @dougbu made, and he's working with @mmitche on further improvements. Long-term, that's probably the road we want to take, we're just trying to come up with a fast interim solution to improve build flakiness until then. This is certainly a good motivator for further refactoring, but the complexity of our build means that it's not by any means a small undertaking |
In addition, dotnet/aspnetcore hacks around eng/comon/tools.* because the binlog conventions don't work for us at all. Hacks include the bit mentioned in #21478 (comment) and reasons are covered in #21478 (comment) |
BTW I strongly agree with this. However, we've also talked a great deal in the past about exceptions from the defaults: https://github.com/dotnet/arcade/blob/master/Documentation/Policy/DefaultsGuidance.md#category-2---differences-which-are-truly-exceptions-to-what-is-otherwise-correct-for-net-core-overall Doesn't that apply here? In the case @wtgodbe has proposed to fix in dotnet/arcade#5421, Arcade is not only providing conventions for the majority but erroring out if they're not followed to the letter. It would be enough however if the error were simply removed from |
I believe the requirement of binlogs on CI is the right thing to do. It is a problem now only because of a bug in MSBuild. I'm 100% fine with a workaround, ideally such that doesn't change public API of our shared build scripts since it's hopefully temporary. Removing the requirement would work as long as we file an issue to follow up and add it back. Re setting the default for |
Re customizing naming of binlog files and why having a common name for the main build binlog doesn't work for aspnet. That's certainly worth of discussion. I believe it is valuable to have a common name for the main binlog, so perhaps we can find a way how to make it work. A separate issue would probably be the best place to follow up on this. |
Looking at https://github.com/dotnet/aspnetcore/blob/master/.azure/pipelines/ci.yml#L126: You have a sequence of calls to In fact, you have |
This is what I mean by "what's the right thing". As tmat said, let's work-around where needed. ALSO, let's be agreed on what we really want. By "don't work", is there more than the msbuild bug / size issue?
Yes it does - precisely in fact. Let me know if there's an aspect of the conversation which you feel is not following the guidance. My view is that we're having a discussion about which category of thing this is. I want to be sure we're in agreement on what the longer term right thing is. That should inform the next step, and give context as to whether it's a work-around for an msbuild issue, or a placeholder for something else. |
It sounds like there are follow up questions/discussions to be had. Let's start those on separate threads and not "clog" this urgent one up. @wtgodbe - would you mind firing up another issue or two to track that? I presume one would be "if msbuild could....", and the other might be "big log name customization" in general. Is there agreement on what the most expedient and best work-around is for right now? I'm fine with that, assuming we're following up on what we really want to do afterwards. |
I believe dotnet/msbuild#3577 tracks the msbuild OOM, though it's pretty stale - @rainersigwald should I open a new one? As for binlogs - dotnet/arcade#5436 |
Closing in favor of #21630 |
No, that's the best bet to track oom due to runaway log queueing. |
There is an msbuild bug causing some of our builds to OOM when the binlogs get too large (some detain in dotnet/msbuild#3579). For now let's stop building them until the bug is fixed (devs can check the
/bl
switch back in to their PR builds if they want to produce binlogs on a case-by-case basis until then)@riarenas @JohnTortugo am I right that replacing
-ci
w/-NodeReuse false
will give us the same behavior as before, minus the binlogs?