Skip to content

[Incremental] Interleave waves by adding llbuild rules dynamically #351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 12, 2020

Conversation

davidungar
Copy link
Contributor

Redo the interaction between llbuild and IncrementalBuildState in order to allow 2nd wave compilations to interleave with first wave compilations. As each first wave job finishes, add rules for discovered 2nd wave jobs.

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@owenv owenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a really great improvement! I don't know enough about llbuild best practices to meaningfully review the changes to MultiJobExecutor, but I didn't see any issues there. This PR will need a simultaneous update to SPM, but it'll be a really small change

func execute(jobs: [Job],
incrementalCompilationState: IncrementalCompilationState?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require a corresponding (but small) change to SPMSwiftDriverExecutor in the SPM repo so that it still conforms to the protocol. Right now it's implementation of this requirement just fatalErrors because it only uses it's executor for dependency scanning during the planning step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I haven't been into that repo, but will check it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: haven't been there lately

guard let postCompileJobs = incrementalCompilationState.postCompileJobs
else {
fatalError("planning must have finished by now")
var numParallelJobs: Int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: I think the intent would be a little clearer if this was just a constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Nice catch.

@davidungar
Copy link
Contributor Author

This looks like a really great improvement! I don't know enough about llbuild best practices to meaningfully review the changes to MultiJobExecutor, but I didn't see any issues there. This PR will need a simultaneous update to SPM, but it'll be a really small change

Thank you. As always your help is much appreciated.

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar
Copy link
Contributor Author

Will need to be synchronized with swiftlang/swift-package-manager#3033

@BenchR267 BenchR267 self-requested a review November 9, 2020 07:50
@CodaFi
Copy link
Contributor

CodaFi commented Nov 10, 2020

Just for my edification, let's clarify some terminology:

  • Primary Jobs
  • Secondary Jobs
  • Tertiary Jobs

It looks like primary jobs are the initial jobs gathered by files with mod time changes, external deps modified, etc. Secondary jobs are gathered during the swiftdeps reload step after a job finishes. Tertiary jobs are sort of a mystery to me.

@davidungar
Copy link
Contributor Author

Just for my edification, let's clarify some terminology:

  • Primary Jobs
  • Secondary Jobs
  • Tertiary Jobs

It looks like primary jobs are the initial jobs gathered by files with mod time changes, external deps modified, etc. Secondary jobs are gathered during the swiftdeps reload step after a job finishes. Tertiary jobs are sort of a mystery to me.

Thanks much for looking at this! I will add some comments. You are absolutely right. Tertiary jobs include linking, etc. These are the jobs that must wait until all primary and secondary jobs finish. I'd be happy to brainstorm better names, if these seem to be obscure.

@davidungar davidungar requested a review from nkcsgexi November 10, 2020 19:02
@davidungar
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@artemcm artemcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! The code is quite a bit easier to follow now, too, versus the previous approach. 👍
It would be nice to have someone who knows llbuild a bit more to have a look.

if finishedJob.kind == .compile {
finishedJob.primaryInputs.forEach {
if pendingInputs.remove($0) == nil {
fatalError("should have been pending")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error could be a little bit more descriptive and show which input is involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll fix it.

incrementalCompilationState?.addPreOrCompileJobGroups(preAndCompileJobGroups)
incrementalCompilationState?.addPostCompileJobs(postCompileJobs)
incrementalCompilationState?.addPrimaryOrSkippedJobGroups(preAndCompileJobGroups)
incrementalCompilationState?.addTertiaryJobs(postCompileJobs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a change request, but rather my 2¢: I preferred the name postCompileJobs, as it is completely self-explanatory. Is there a semantic difference that necessitated the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of thing is another reason I appreciate you and @CodaFi taking the time to look at this PR. It's a great sanity check on clarity. I'll change it to the other kind of name.

Copy link
Contributor

@CodaFi CodaFi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup here is quite nice. I agree with Artem's assessment though considering even reading the comments that were there were not sufficient for me to grasp the naming change for the abstractions here.


/// All the compilation job (groups) known to be required, which have not finished yet.
/// (Changes as jobs complete.)
private var unfinishedPrimaryJobs = Set<Job>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this correct is obviously the first step, but after that might want to look at how a Set performs. #103 Was able to get potentially huge performance wins in the past by not doing Job equality checks (or hashing jobs). This is something that scales with the command line and the test cases used as a benchmark for that pull request had a non-practically large number of parameters, but similar concerns may apply here.

If we have to, we can probably use a similar approach where we use an index in the primaryJobsInOrder array as a job ID instead of using the full job itself.

Copy link
Contributor Author

@davidungar davidungar Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a valid concern. I'm reluctant to go for indices unless completely necessary because then the invariants spread out a bit. But I will take a look at the code again with an eye peeled for performance issues.

And I appreciate your time and trouble here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this complexity is N log N when in incremental mode. But the driver is reading and integrating the .swiftdeps file for each job, so it's unclear. Something to watch out for, though, yes.

producerMap: [VirtualPath: Int],
jobs: [Job],
nonincrementalJobs: [Job],
incrementalCompilationState: IncrementalCompilationState?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems as though this could be described as

enum {
  case normal([Job])
  case incremental(IncrementalCompilationState)
}

Not sure if that helps readability or cleans up the api, but it's definitely more explicit (Naming could be significantly better)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had thought of it. IIRC, there is an error case where the nonicrementalJobs are needed, even in the incremental case. I'll go back and look again. IIRC, in case of serious compiler failure, for instance no .swiftdeps written out by a frontend job, the legacy driver attempts to compile every input. But I want to double-check that. And then there is the question of preserving that behavior or not, if I'm right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, did it!

private static func addProducts(of job: Job, index: Int, to producerMap: inout [VirtualPath: Int]) {
for output in job.outputs {
if let _ = producerMap.updateValue(index, forKey: output.file) {
fatalError("multiple producers for output \(output): \(job) \(producerMap[output.file]!)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fatalError("multiple producers for output \(output): \(job) \(producerMap[output.file]!)")
fatalError("multiple producers for output \(output): \(job) \(jobs[producerMap[output.file]!])")

If you want direct comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes--great catch!! Thank you.

/// Records the task build engine for the `ExecuteAllJobs` rule (and task) so that when a
/// primary job finishes, and secondaries are discovered, inputs can be added to that rule for
/// any required secondary. Set only once.
private(set) var executeAllJobsTaskBuildEngine: LLTaskBuildEngine? = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be weak?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could well be! Thank you again. I'll look at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like nilling it out in the deinit for Context better. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that even work if there is a cycle? If there was a cycle deinit for Context wouldn't be called?

Just thinking out if there is even a cycle here:
Context -> Engine -> Job -> Context
I think it should be fine because the Engine should release the job when it's done, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh looks like it's actually
Engine -> Delegate -> Context -> Engine
Those are all strong links

@davidungar
Copy link
Contributor Author

The cleanup here is quite nice. I agree with Artem's assessment though considering even reading the comments that were there were not sufficient for me to grasp the naming change for the abstractions here.

Thank you--I really appreciate the time and thought of you and all the other reviewers. I'll go for the other names.

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar davidungar merged commit 07d23c0 into swiftlang:main Nov 12, 2020
@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar davidungar deleted the incremental-11-6 branch January 29, 2021 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants