Skip to content

Don't produce binlogs in CI #21478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Don't produce binlogs in CI #21478

wants to merge 1 commit into from

Conversation

wtgodbe
Copy link
Member

@wtgodbe wtgodbe commented May 4, 2020

There is an msbuild bug causing some of our builds to OOM when the binlogs get too large (some detain in dotnet/msbuild#3579). For now let's stop building them until the bug is fixed (devs can check the /bl switch back in to their PR builds if they want to produce binlogs on a case-by-case basis until then)

@riarenas @JohnTortugo am I right that replacing -ci w/ -NodeReuse false will give us the same behavior as before, minus the binlogs?

@wtgodbe wtgodbe requested review from Pilchie, rainersigwald and a team May 4, 2020 21:26
@ghost ghost added the area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework label May 4, 2020
@dougbu
Copy link
Contributor

dougbu commented May 4, 2020

@wtgodbe -ci changes a number of installation locations and other options in eng/common/tools.ps1. I suspect this change will update directories outside the workspace without more work.

One ugly option would be to leave -ci alone and change Arcade bits to remove the error from https://github.com/dotnet/aspnetcore/blob/master/eng/common/tools.ps1#L608-L611 But, we'll need to change dotnet/arcade itself ASAP.

@wtgodbe
Copy link
Member Author

wtgodbe commented May 4, 2020

Ah, darn. Sounds like we can't opt out of -ci then. I'd be in favor of adding a switch to eng\common\build.xyz to opt out of producing a binlog (and changing that error condition like you mentioned). @riarenas @JohnTortugo @markwilkie thoughts?

@JohnTortugo
Copy link

Looks like you could set the binlog parameter to empty to disable the log creation: https://github.com/dotnet/arcade/blob/master/eng/common/build.ps1#L88

@chcosta @tmat - might have further suggestions.

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

@JohnTortugo we don't use the eng/common/build.* scripts anywhere we're hitting the msbuild OOM issue. But, we do use eng/common/tools.* and those error out if -binaryLog is $false when -ci is $true.

Short term workaround is to remove the error at https://github.com/dotnet/aspnetcore/blob/master/eng/common/tools.ps1#L608-L611, remove the $binaryLog default at https://github.com/dotnet/aspnetcore/blob/master/build.ps1#L350-L352, and remove all -bl and /bl options on Windows. But, we'll lose the first change if the error isn't also removed from Arcade soon.

@tmat
Copy link
Member

tmat commented May 5, 2020

we don't use the eng/common/build.* scripts anywhere we're hitting the msbuild OOM issue. But, we do use eng/common/tools.* and those error out if -binaryLog is $false when -ci is $true.

Do you invoke MSbuild with combined builds steps, i.e. /p:Restore=true /p:Build=true /bl:Build.binlog e.g. as build.ps1 does it? If so would msbuild still OOM if you split it into two invocations, i.e. MSBuild /p:Restore=true /bl:Restore.binlog and MSBuild /p:Build=true /bl:Build.binlog?

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

If so would msbuild still OOM if you split it into two invocations, i.e. MSBuild /p:Restore=true /bl:Restore.binlog and MSBuild /p:Build=true /bl:Build.binlog?

This may help. But, I've found doing that breaks some functional tests because global properties (anything specified on the command line) can't be overridden.

I've also been looking at using dotnet msbuild instead of msbuild for most of our projects. The main blocker there is the managed projects (those I'd like to build w/ dotnet msbuild) have references to native project. Determining and using the relevant asset locations is painful. I'm continuing to push on this.

Other big opportunities include work @mmitche is doing on improving build performance in general and work I'm doing on using Roslyn to generate ref/ assemblies. That second bit allows us to remove all of the ref/ projects, simplifying the project dependency graph significantly.

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

@wtgodbe I disabled Windows binary logs in a draft PR: c8652df You might want to start w/ that and extend it to cover Linux and macOS builds.

@JunTaoLuo
Copy link
Contributor

So we are planning to take #21487 instead?

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

So we are planning to take #21487 instead?

No. However, the second commit there may be something we want to merge into Arcade and our CI builds.

@wtgodbe
Copy link
Member Author

wtgodbe commented May 5, 2020

@wtgodbe
Copy link
Member Author

wtgodbe commented May 5, 2020

It's likely other folks rely on the behavior that passing -ci ensures creation of a binary log - I'd prefer adding a switch to Arcade's build scripts that allows one to opt out of creating a binlog even if -ci is passed

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

@wtgodbe that sounds good to me.

@wtgodbe
Copy link
Member Author

wtgodbe commented May 5, 2020

I'll send a PR to arcade

@tmat
Copy link
Member

tmat commented May 5, 2020

This may help. But, I've found doing that breaks some functional tests because global properties (anything specified on the command line) can't be overridden.

I don't understand why this would be. The current implementation of build.ps1 calls msbuild with: /p:Restore=true /p:Build=true /bl:Build.binlog. This can be replaced by two calls each setting one of the properties and have the same semantics but two distinct binlogs.

I'd prefer this approach over adding a new switch. This has also the benefit of making the binlog easier to analyze when you're not interested in restore phase (and vice versa). Some folks have asked for this.

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

This can be replaced by two calls each setting one of the properties and have the same semantics but two distinct binlogs.

As I said and at least for dotnet/aspnetcore, this works unless we also pass -test. I also seem to remember previous attempts to break the build into stages (something dotnet/aspnetcore did before Arcadification) worsened performance because msbuild or dotnet msbuild could no longer reason over the entire process. No, I don't recall the details.

@dougbu
Copy link
Contributor

dougbu commented May 5, 2020

Should also mention we've had guidance for ages about Arcade providing reasonable defaults and conventions but allowing repos to opt out. Why are you pushing back on enabling this particular opt-out @tmat

@tmat
Copy link
Member

tmat commented May 6, 2020

Because I believe there is an alternative solution that addresses the issue, does not require another switch and also addresses another feedback that we have received.

worsened performance because msbuild or dotnet msbuild could no longer reason over the entire process

It would be good to check if this is actually true and why. Since the output of restore is stored in artifacts\obj files that are input to build it seems odd that that should be a problem. But I might be missing something.

@dougbu
Copy link
Contributor

dougbu commented May 6, 2020

Because I believe there is an alternative solution that addresses the issue

A much more complicated solution that would maintain the hacks we have in our build.ps1 and build.sh scripts to enable us to use /bl:{name} on our command lines.

@wtgodbe
Copy link
Member Author

wtgodbe commented May 6, 2020

I agree with Doug that it's easier/less disruptive to add an opt-out switch to Arcade than to continue to compound hacks in this repo to work around the lack of one - @markwilkie @chcosta what do you think?

@tmat
Copy link
Member

tmat commented May 6, 2020

Adding a new switch to build.ps1 is adding new public API to workaround a bug in msbuild. I don't think that's a good approach.

@dougbu
Copy link
Contributor

dougbu commented May 6, 2020

@tmat even if we didn't want to disable producing binary logs in this repo, the new switch enables us removing hacks like this:

# Workaround Arcade check which asserts BinaryLog is true on CI.
# We always use binlogs on CI, but we customize the name of the log file
$tmpBinaryLog = $BinaryLog
if ($CI) {
    $BinaryLog = $true
}

@tmat
Copy link
Member

tmat commented May 6, 2020

What is the reason why you want to customize the name of the binlog? I am aware that people wanted separate binlogs for restore and build phases. If build.ps1 does that by default (separate binlogs) then wouldn't that remove the need to custom name the binlogs and also remove the OOM issue?

@tmat
Copy link
Member

tmat commented May 6, 2020

BTW, I think I heard someone mentioning that we might want to do analysis/telemetry on binlogs produced by our builds. If repos use custom names for their binlogs such analysis tool won't be able to find the right logs.

@dougbu
Copy link
Contributor

dougbu commented May 6, 2020

What is the reason why you want to customize the name of the binlog?

Many of our builds require multiple steps e.g. to handle projects that must be built after the shared framework like our installers. We're not going to hack together such things in our msbuild infrastructure to avoid this.

We also use distinct (per-platform) names for our binary logs across our CI jobs to make it easy to download multiple logs without creating meaningless Build (1).binlog files.

@tmat
Copy link
Member

tmat commented May 6, 2020

OK. Well, then I guess different repos will just have different binlogs and we won't be able to do analysis on them. Oh well.

@wtgodbe
Copy link
Member Author

wtgodbe commented May 6, 2020

Why can't the analysis just run on anything that ends with .binlog? Why does the name specifically have to be build.binlog? Customizing binlog names seems very desirable to me for the reasons Doug listed above.

In any case, it's clear you're very opposed to this - we need to be able to disable the creation of binlogs in CI while we wait for dotnet/msbuild#3579 to be fixed, but I don't see any way to do that without modifying our local eng/common/ scripts, which will just get overwritten by the next arcade update. Do you have a suggestion for how we can fix the issue without adding an opt-out? Just removing the error condition from arcade's tools.ps1 won't work, since Arcade sets binaryLog to true if CI is true

@tmat
Copy link
Member

tmat commented May 6, 2020

I'm not gonna block this change as I understand the need to workaround the bug. I just said I don't like it as it seems very ad-hoc to me. As I suggested on the other thread we could call msbuild twice to separate restore and build logs.

Why can't the analysis just run on anything that ends with .binlog

We can but it might result in more complexity. Some repos might build more things in addition to the main restore/build and have more logs there. Depending on what the analysis is doing it might need to somehow figure out which log is supposed to have what. Maybe not, IDK.

Also, from a perspective of someone looking frequently at builds of many different repos, I prefer the log names to follow some common convention - sure, there can be other logs specific to the repo but it's useful when I can immediately see which is the main build log. The same reasoning as having common anything.

@markwilkie
Copy link
Member

The value of a shared infra diminishes everytime one of us doesn't actually share. If we start renaming and/or putting binlogs in different places, it means that we've diminished our ability to "grok" our builds. My hope is that we're normalizing - not the opposite.

Also....OOM's are a thing, and doing unnatural acts to keep binlogs normal is also wonky.

Question: could we turn this into a point-in-time situation where we acknowledge what the "right thing" is? For example, is there further build refactoring that is goodness, and also takes care of the binlog issue?

@wtgodbe
Copy link
Member Author

wtgodbe commented May 6, 2020

Question: could we turn this into a point-in-time situation where we acknowledge what the "right thing" is? For example, is there further build refactoring that is goodness, and also takes care of the binlog issue?

Some of this could presumably be solved by further improvements to our project files - the binlog size was recently reduced by ~30% by a change @dougbu made, and he's working with @mmitche on further improvements. Long-term, that's probably the road we want to take, we're just trying to come up with a fast interim solution to improve build flakiness until then. This is certainly a good motivator for further refactoring, but the complexity of our build means that it's not by any means a small undertaking

@dougbu
Copy link
Contributor

dougbu commented May 6, 2020

Question: could we turn this into a point-in-time situation where we acknowledge what the "right thing" is? For example, is there further build refactoring that is goodness, and also takes care of the binlog issue?

In addition, dotnet/aspnetcore hacks around eng/comon/tools.* because the binlog conventions don't work for us at all. Hacks include the bit mentioned in #21478 (comment) and reasons are covered in #21478 (comment)

@dougbu
Copy link
Contributor

dougbu commented May 7, 2020

The value of a shared infra diminishes everytime one of us doesn't actually share.

BTW I strongly agree with this. However, we've also talked a great deal in the past about exceptions from the defaults: https://github.com/dotnet/arcade/blob/master/Documentation/Policy/DefaultsGuidance.md#category-2---differences-which-are-truly-exceptions-to-what-is-otherwise-correct-for-net-core-overall Doesn't that apply here?

In the case @wtgodbe has proposed to fix in dotnet/arcade#5421, Arcade is not only providing conventions for the majority but erroring out if they're not followed to the letter. It would be enough however if the error were simply removed from eng/common/tools.* and the defaulting (for CI builds) was done only in eng/common/build.*. That change might however break partner repos that (like dotnet/aspnetcore) use eng/common/tools.* but not eng/common/build.* and (unlike dotnet/aspnetcore) want the current $binlog default.

@tmat
Copy link
Member

tmat commented May 7, 2020

I believe the requirement of binlogs on CI is the right thing to do. It is a problem now only because of a bug in MSBuild. I'm 100% fine with a workaround, ideally such that doesn't change public API of our shared build scripts since it's hopefully temporary. Removing the requirement would work as long as we file an issue to follow up and add it back.

Re setting the default for $binaryLog variable. The default can be overridden in eng\configure-toolset.ps1.

@tmat
Copy link
Member

tmat commented May 7, 2020

Re customizing naming of binlog files and why having a common name for the main build binlog doesn't work for aspnet. That's certainly worth of discussion. I believe it is valuable to have a common name for the main binlog, so perhaps we can find a way how to make it work. A separate issue would probably be the best place to follow up on this.

@tmat
Copy link
Member

tmat commented May 7, 2020

Looking at https://github.com/dotnet/aspnetcore/blob/master/.azure/pipelines/ci.yml#L126:

You have a sequence of calls to build.cmd as build steps. The common build.ps1|sh are designed to be called only once and that's very intentional. Since yaml is not executable locally any logic that's in it is not reproducible on dev machine without extracting it manually from yaml first. Arcade thus strongly prefers to have a single build entry point script.
That script can then invoke msbuild (and other tools) multiple times. That's the reason for separation of common\tools.ps1|sh and common\build.ps1|sh. The former can be used from custom build.ps1|sh scripts that implement repo-specific build entry point. common\build.ps1|sh is intended only for repositories that do not need multiple invocations of msbuild.

In fact, you have build.ps1 in the root of aspnetcore repo. I'd suggest moving the logic from YAML to this build script, where you can invoke the MSBuild function defined by common\tools.ps1 multiple times with different parameters (including setting /bl to whatever name you want for the builds that follow the main build step).

@markwilkie
Copy link
Member

In addition, dotnet/aspnetcore hacks around eng/comon/tools.* because the binlog conventions don't work for us at all.

This is what I mean by "what's the right thing". As tmat said, let's work-around where needed. ALSO, let's be agreed on what we really want. By "don't work", is there more than the msbuild bug / size issue?

BTW I strongly agree with this. However, we've also talked a great deal in the past about exceptions from the defaults: https://github.com/dotnet/arcade/blob/master/Documentation/Policy/DefaultsGuidance.md#category-2---differences-which-are-truly-exceptions-to-what-is-otherwise-correct-for-net-core-overall Doesn't that apply here?

Yes it does - precisely in fact. Let me know if there's an aspect of the conversation which you feel is not following the guidance. My view is that we're having a discussion about which category of thing this is.

I want to be sure we're in agreement on what the longer term right thing is. That should inform the next step, and give context as to whether it's a work-around for an msbuild issue, or a placeholder for something else.

@markwilkie
Copy link
Member

It sounds like there are follow up questions/discussions to be had. Let's start those on separate threads and not "clog" this urgent one up.

@wtgodbe - would you mind firing up another issue or two to track that? I presume one would be "if msbuild could....", and the other might be "big log name customization" in general.

Is there agreement on what the most expedient and best work-around is for right now?

I'm fine with that, assuming we're following up on what we really want to do afterwards.

@wtgodbe
Copy link
Member Author

wtgodbe commented May 7, 2020

I believe dotnet/msbuild#3577 tracks the msbuild OOM, though it's pretty stale - @rainersigwald should I open a new one?

As for binlogs - dotnet/arcade#5436

@wtgodbe
Copy link
Member Author

wtgodbe commented May 8, 2020

Closing in favor of #21630

@rainersigwald
Copy link
Member

I believe microsoft/msbuild#3577 tracks the msbuild OOM, though it's pretty stale - @rainersigwald Rainer Sigwald FTE should I open a new one?

No, that's the best bet to track oom due to runaway log queueing.

@wtgodbe wtgodbe closed this May 11, 2020
@wtgodbe wtgodbe deleted the wtgodbe/NoBinLogs branch May 11, 2020 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants