Skip to content

[5.5, Incremental] Harness immutability to make inputDependencySourceMap robust #712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

davidungar
Copy link
Contributor

@davidungar davidungar commented Jun 11, 2021

Partial cherry-pick of #675

Follow-on to #710. Suggest not reviewing this until that PR is merged and this one is rebased.
Does all of the below, except it does not change the format or contents of the saved prior ModuleDependencyGraph, in order to minimize risk.

Make the inputDependencyMap immutable, solely reflecting the OutputFileMap.
Do not include it in the priors.

In a previous PR, @CodaFi uncovered and explained issues with the `inputDependencySourceMap:

   // FIXME: The map between swiftdeps and swift files is absolutely *not*
   // a bijection. In particular, more than one swiftdeps file can be encountered
   // in the course of deserializing priors *and* reading the output file map
   // *and* re-reading swiftdeps files after frontends complete
   // that correspond to the same swift file. These cause two problems:
   // - overwrites in this data structure that lose data and
   // - cache misses in `getInput(for:)` that cause the incremental build to
   // turn over when e.g. entries in the output file map change. This should be
   // replaced by a multi-map from swift files to dependency sources,
   // and a regular map from dependency sources to swift files -
   // since that direction really is one-to-one.

This PR solves the problem in a simpler way. The key insight is that the output file map contains the information linking a source file to the swiftdeps file for any build. Thus, the output file map should just determine the entries in the inputDependencySourceMap, just as it did before the introduction of incremental imports. Then the map in question does become a true bijection. It can also be made immutable, which greatly eases reasoning about the code.

It will also make it easy to cull removed file nodes from priors in a future PR.

@davidungar davidungar requested a review from a team as a code owner June 11, 2021 22:03
@davidungar davidungar marked this pull request as draft June 11, 2021 22:06
@davidungar davidungar marked this pull request as ready for review June 21, 2021 18:23
@davidungar
Copy link
Contributor Author

rdar://79211265

@davidungar
Copy link
Contributor Author

@swift-ci please test

@davidungar davidungar requested a review from artemcm June 21, 2021 18:38
Copy link
Contributor

@CodaFi CodaFi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in the serialization format requires a bump in the format version number. Let's just do that and drop the key instead of leaving behind an needless branch in the deserialization code.

///
/// - Returns: the map, or nil if error
init?(_ info: IncrementalCompilationState.IncrementalDependencyAndInputSetup) {
self.simulateGetInputFailure = info.simulateGetInputFailure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this still? We know how to reproduce the issue in the old code, and the new code should not be subject to the same problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simulateGetInputFailure is gone on the main branch. It remains here in the interest of minimal changes to this branch. If you insist, I'm happy with removing it. Whatever the folks in charge of this branch prefer.

import Foundation
import TSCBasic

@_spi(Testing) public struct InputDependencySourceMap: Equatable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer we not build custom data structures to solve this problem. It's unclear what invariants this structure is encapsulating and whether those are worth the maintenance burden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your preference, and the custom type does add one more term to the vocabulary here. Here's why I think that this type eases maintenance:

Because it:

  1. Provides a focus for an overall comment,
  2. Moves the invariants into a dedicated initializer,
  3. Replaces subscript invocations with domain-specific, intention-revealing names such as sourceIfKnown(for:)
  4. Provides a separate place for an extension to OutputFileMap that is specific to this data structure
    I believe that it enhances the maintainability of the code by making it easier for someone to understand what's going on.

The invariants are spelled out in the comment: ```
/// Maps input files (e.g. .swift) to and from the DependencySource object.
///
// This map caches the same information as in the OutputFileMap, but it
// optimizes the reverse lookup, and includes path interning via `DependencySource`.
// Once created, it does not change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Provides a focus for an overall comment,

Every decl does...

  1. Moves the invariants into a dedicated initializer

You are sequestering invariants, but it's unclear what they actually are.

  1. Replaces subscript invocations with domain-specific, intention-revealing names such as sourceIfKnown(for:)

I have articulated in the past that Swift does not usually use intention-revealing selectors because signatures and labels can communicate far more information at the point of use so you have an expanded vocabulary at your fingertips that you are not using.

  1. Provides a separate place for an extension to OutputFileMap that is specific to this data structure
    I believe that it enhances the maintainability of the code by making it easier for someone to understand what's going on.

Maintaining an entire other data structure for that purpose seems like a huge hammer to throw at that particular nail.

The invariants are spelled out in the comment:

Are you verifying them anywhere?

@_spi(Testing) public func enumerateToSerializePriors(
_ eachFn: (TypedVirtualPath, DependencySource) -> Void
) {
biMap.forEach(eachFn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would very much prefer we not conflate the point of use with the names of these APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate? I want to be sure I understand the point here. What would you suggest instead?
If you feel that enumerateToSerializePriors is too specific, then we probably disagree about the value of intention-revealing names.

If that is the source of disagreement, here is a nice quote I found that explains what I am trying to accomplish:

If a developer must consider the implementation of a component in order to use it, the value of encapsulation is lost. If someone other than the original developer must infer the purpose of an object or operation based on its implementation, that new developer may infer a purpose that the operation or class fulfills only by chance. If that was not the intent, the code may work for the moment, but the conceptual basis of the design will have been corrupted, and the two developers will be working at cross-purposes.

Therefore,

Name classes and operations to describe their effect and purpose, without reference to the means by which they do what they promise. This relieves the client developer of the need to understand the internals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodaFi
Copy link
Contributor

CodaFi commented Jun 21, 2021

Does this refactoring need to go into Beta 3? Do we have enough time to qualify the new implementation that will follow on after this PR before then?

@davidungar
Copy link
Contributor Author

@CodaFi Welcome aboard! Didn't know you'd have time, or else would have invited you up front. Happy to address your review.

Comment on lines +74 to +85
public mutating func updateValue(_ newValue: T2, forKey key: T1) -> T2? {
let oldValue = map1.updateValue(newValue, forKey: key)
_ = oldValue.map {map2.removeValue(forKey: $0)}
map2[newValue] = key
return oldValue
}
public mutating func updateValue(_ newValue: T1, forKey key: T2) -> T1? {
let oldValue = map2.updateValue(newValue, forKey: key)
_ = oldValue.map {map1.removeValue(forKey: $0)}
map1[newValue] = key
return oldValue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicating the implementation of the subscripts above. Please define one in terms of the other.

Copy link
Contributor Author

@davidungar davidungar Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems tricky because of how the functions rely on the relationship between the two maps and the two types. Can you suggest something?

@davidungar
Copy link
Contributor Author

The change in the serialization format requires a bump in the format version number. Let's just do that and drop the key instead of leaving behind an needless branch in the deserialization code.

That suggestion is perfect for the "main" branch! And that is what I did, modulo the format version number. (I had thought about what would happen without a format number change, but will go back and do a PR on main.)

For for this PR, because we want to minimize risk, I decided to leave the format alone. For example, were there to be a bug that resulted in a new priors not being written when the version number changed, a user could be stuck in never-incremental-import land.

@CodaFi
Copy link
Contributor

CodaFi commented Jun 21, 2021

For example, were there to be a bug that resulted in a new priors not being written when the version number changed

We have no evidence of such a bug though. The same qualification that this code should be put through will detect such a case anyhow. Do you have additional regression tests that you could commit to verify this doesn't happen?

@davidungar
Copy link
Contributor Author

Does this refactoring need to go into Beta 3? Do we have enough time to qualify the new implementation that will follow on after this PR before then?

Great question! The refactoring lets me cherry-pick the immutability in a fashion that reduces risk by following what we've already got in main. It is my considered professional judgement that it would be best to put this into Beta 3, but it is not my ultimate decision. I am heartened by the fact that you have looked this over and have not (AFAICT) found any bugs or risks in it.

@davidungar
Copy link
Contributor Author

For example, were there to be a bug that resulted in a new priors not being written when the version number changed

We have no evidence of such a bug though. The same qualification that this code should be put through will detect such a case anyhow. Do you have additional regression tests that you could commit to verify this doesn't happen?

It is true that we have no evidence, that is why I used the subjunctive. I plan to create a PR for main, to both change the version number, as I should have before, and to add a test to ensure correct behavior when the number changes.

@davidungar
Copy link
Contributor Author

davidungar commented Jun 21, 2021

I see you have also commented on some of the same changes in in #710.
I'll copy my responses over there.

@CodaFi
Copy link
Contributor

CodaFi commented Jun 22, 2021

I am heartened by the fact that you have looked this over and have not (AFAICT) found any bugs or risks in it.

That is manifestly not what I said, either now or in the past. The refactorings present in this patchset don't make sense to include in 5.5 unless you intend to actually make a behavior changing patch based on them, and it is that one that I am afraid of getting merged into the release branch.

@davidungar
Copy link
Contributor Author

davidungar commented Jun 22, 2021

I am heartened by the fact that you have looked this over and have not (AFAICT) found any bugs or risks in it.

That is manifestly not what I said, either now or in the past. The refactorings present in this patchset don't make sense to include in 5.5 unless you intend to actually make a behavior changing patch based on them, and it is that one that I am afraid of getting merged into the release branch.

I am confused. This patchset does include a behavior-changing patch, making the inputSourceDependencyMap immutable.

Which of your review comments expressed that you found a bug, or a risk of a bug? I see that you have expressed disagreements on naming and structure, but not functional risks. Did I miss a comment? Which one?

@davidungar davidungar closed this Jun 25, 2021
@davidungar davidungar deleted the 5.5-catchup-2 branch July 2, 2021 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants