[5.5, Incremental] Harness immutability to make inputDependencySourceMap robust #712

davidungar · 2021-06-11T22:03:18Z

Partial cherry-pick of #675

Follow-on to #710. Suggest not reviewing this until that PR is merged and this one is rebased.
Does all of the below, except it does not change the format or contents of the saved prior ModuleDependencyGraph, in order to minimize risk.

Make the inputDependencyMap immutable, solely reflecting the OutputFileMap.
Do not include it in the priors.

In a previous PR, @CodaFi uncovered and explained issues with the `inputDependencySourceMap:

   // FIXME: The map between swiftdeps and swift files is absolutely *not*
   // a bijection. In particular, more than one swiftdeps file can be encountered
   // in the course of deserializing priors *and* reading the output file map
   // *and* re-reading swiftdeps files after frontends complete
   // that correspond to the same swift file. These cause two problems:
   // - overwrites in this data structure that lose data and
   // - cache misses in `getInput(for:)` that cause the incremental build to
   // turn over when e.g. entries in the output file map change. This should be
   // replaced by a multi-map from swift files to dependency sources,
   // and a regular map from dependency sources to swift files -
   // since that direction really is one-to-one.

This PR solves the problem in a simpler way. The key insight is that the output file map contains the information linking a source file to the swiftdeps file for any build. Thus, the output file map should just determine the entries in the inputDependencySourceMap, just as it did before the introduction of incremental imports. Then the map in question does become a true bijection. It can also be made immutable, which greatly eases reasoning about the code.

It will also make it easy to cull removed file nodes from priors in a future PR.

…ncySourceMap Will facilitate making it immutable

… robust The output file map is the source of truth.

davidungar · 2021-06-21T18:24:20Z

rdar://79211265

davidungar · 2021-06-21T18:38:10Z

@swift-ci please test

CodaFi

The change in the serialization format requires a bump in the format version number. Let's just do that and drop the key instead of leaving behind an needless branch in the deserialization code.

CodaFi · 2021-06-21T18:57:05Z

...SwiftDriver/IncrementalCompilation/ModuleDependencyGraphParts/InputDependencySourceMap.swift

+  ///
+  /// - Returns: the map, or nil if error
+  init?(_ info: IncrementalCompilationState.IncrementalDependencyAndInputSetup) {
+    self.simulateGetInputFailure = info.simulateGetInputFailure


Why do we need this still? We know how to reproduce the issue in the old code, and the new code should not be subject to the same problem.

simulateGetInputFailure is gone on the main branch. It remains here in the interest of minimal changes to this branch. If you insist, I'm happy with removing it. Whatever the folks in charge of this branch prefer.

CodaFi · 2021-06-21T18:59:35Z

...SwiftDriver/IncrementalCompilation/ModuleDependencyGraphParts/InputDependencySourceMap.swift

+import Foundation
+import TSCBasic
+
+@_spi(Testing) public struct InputDependencySourceMap: Equatable {


I would still prefer we not build custom data structures to solve this problem. It's unclear what invariants this structure is encapsulating and whether those are worth the maintenance burden.

I understand your preference, and the custom type does add one more term to the vocabulary here. Here's why I think that this type eases maintenance:

Because it:

Provides a focus for an overall comment,

Moves the invariants into a dedicated initializer,

Replaces subscript invocations with domain-specific, intention-revealing names such as sourceIfKnown(for:)

Provides a separate place for an extension to OutputFileMap that is specific to this data structure
I believe that it enhances the maintainability of the code by making it easier for someone to understand what's going on.

The invariants are spelled out in the comment: ```
/// Maps input files (e.g. .swift) to and from the DependencySource object.
///
// This map caches the same information as in the OutputFileMap, but it
// optimizes the reverse lookup, and includes path interning via `DependencySource`.
// Once created, it does not change.

Provides a focus for an overall comment,

Every decl does...

Moves the invariants into a dedicated initializer

You are sequestering invariants, but it's unclear what they actually are.

Replaces subscript invocations with domain-specific, intention-revealing names such as sourceIfKnown(for:)

I have articulated in the past that Swift does not usually use intention-revealing selectors because signatures and labels can communicate far more information at the point of use so you have an expanded vocabulary at your fingertips that you are not using.

Provides a separate place for an extension to OutputFileMap that is specific to this data structure
I believe that it enhances the maintainability of the code by making it easier for someone to understand what's going on.

Maintaining an entire other data structure for that purpose seems like a huge hammer to throw at that particular nail.

The invariants are spelled out in the comment:

Are you verifying them anywhere?

CodaFi · 2021-06-21T19:00:19Z

...SwiftDriver/IncrementalCompilation/ModuleDependencyGraphParts/InputDependencySourceMap.swift

+  @_spi(Testing) public func enumerateToSerializePriors(
+    _ eachFn: (TypedVirtualPath, DependencySource) -> Void
+  ) {
+    biMap.forEach(eachFn)


Would very much prefer we not conflate the point of use with the names of these APIs.

Could you please elaborate? I want to be sure I understand the point here. What would you suggest instead?
If you feel that enumerateToSerializePriors is too specific, then we probably disagree about the value of intention-revealing names.

If that is the source of disagreement, here is a nice quote I found that explains what I am trying to accomplish:

If a developer must consider the implementation of a component in order to use it, the value of encapsulation is lost. If someone other than the original developer must infer the purpose of an object or operation based on its implementation, that new developer may infer a purpose that the operation or class fulfills only by chance. If that was not the intent, the code may work for the moment, but the conceptual basis of the design will have been corrupted, and the two developers will be working at cross-purposes.

Therefore,

Name classes and operations to describe their effect and purpose, without reference to the means by which they do what they promise. This relieves the client developer of the need to understand the internals.

From http://ddd.fed.wiki.org/view/intention-revealing-interfaces

https://swift.org/documentation/api-design-guidelines/#naming

CodaFi · 2021-06-21T19:01:41Z

Does this refactoring need to go into Beta 3? Do we have enough time to qualify the new implementation that will follow on after this PR before then?

davidungar · 2021-06-21T19:04:43Z

@CodaFi Welcome aboard! Didn't know you'd have time, or else would have invited you up front. Happy to address your review.

CodaFi · 2021-06-21T19:23:07Z

Sources/SwiftDriver/IncrementalCompilation/BidirectionalMap.swift

+  public mutating func updateValue(_ newValue: T2, forKey key: T1) -> T2? {
+    let oldValue = map1.updateValue(newValue, forKey: key)
+    _ = oldValue.map {map2.removeValue(forKey: $0)}
+    map2[newValue] = key
+    return oldValue
+  }
+  public mutating func updateValue(_ newValue: T1, forKey key: T2) -> T1? {
+    let oldValue = map2.updateValue(newValue, forKey: key)
+   _ = oldValue.map {map1.removeValue(forKey: $0)}
+    map1[newValue] = key
+    return oldValue
+  }


This is duplicating the implementation of the subscripts above. Please define one in terms of the other.

Hmm, it seems tricky because of how the functions rely on the relationship between the two maps and the two types. Can you suggest something?

davidungar · 2021-06-21T19:23:32Z

The change in the serialization format requires a bump in the format version number. Let's just do that and drop the key instead of leaving behind an needless branch in the deserialization code.

That suggestion is perfect for the "main" branch! And that is what I did, modulo the format version number. (I had thought about what would happen without a format number change, but will go back and do a PR on main.)

For for this PR, because we want to minimize risk, I decided to leave the format alone. For example, were there to be a bug that resulted in a new priors not being written when the version number changed, a user could be stuck in never-incremental-import land.

CodaFi · 2021-06-21T19:30:39Z

For example, were there to be a bug that resulted in a new priors not being written when the version number changed

We have no evidence of such a bug though. The same qualification that this code should be put through will detect such a case anyhow. Do you have additional regression tests that you could commit to verify this doesn't happen?

davidungar · 2021-06-21T19:53:00Z

Does this refactoring need to go into Beta 3? Do we have enough time to qualify the new implementation that will follow on after this PR before then?

Great question! The refactoring lets me cherry-pick the immutability in a fashion that reduces risk by following what we've already got in main. It is my considered professional judgement that it would be best to put this into Beta 3, but it is not my ultimate decision. I am heartened by the fact that you have looked this over and have not (AFAICT) found any bugs or risks in it.

davidungar · 2021-06-21T19:56:52Z

For example, were there to be a bug that resulted in a new priors not being written when the version number changed

We have no evidence of such a bug though. The same qualification that this code should be put through will detect such a case anyhow. Do you have additional regression tests that you could commit to verify this doesn't happen?

It is true that we have no evidence, that is why I used the subjunctive. I plan to create a PR for main, to both change the version number, as I should have before, and to add a test to ensure correct behavior when the number changes.

davidungar · 2021-06-21T19:59:42Z

I see you have also commented on some of the same changes in in #710.
I'll copy my responses over there.

CodaFi · 2021-06-22T01:46:43Z

I am heartened by the fact that you have looked this over and have not (AFAICT) found any bugs or risks in it.

That is manifestly not what I said, either now or in the past. The refactorings present in this patchset don't make sense to include in 5.5 unless you intend to actually make a behavior changing patch based on them, and it is that one that I am afraid of getting merged into the release branch.

davidungar · 2021-06-22T05:36:41Z

I am heartened by the fact that you have looked this over and have not (AFAICT) found any bugs or risks in it.

That is manifestly not what I said, either now or in the past. The refactorings present in this patchset don't make sense to include in 5.5 unless you intend to actually make a behavior changing patch based on them, and it is that one that I am afraid of getting merged into the release branch.

I am confused. This patchset does include a behavior-changing patch, making the inputSourceDependencyMap immutable.

Which of your review comments expressed that you found a bug, or a risk of a bug? I see that you have expressed disagreements on naming and structure, but not functional risks. Did I miss a comment? Which one?

David Ungar added 3 commits June 10, 2021 23:30

[NFC, Incremental] Create a domain-specific type for the inputDepende…

fe7100a

…ncySourceMap Will facilitate making it immutable

[Incremental] Harness immutability to make inputDependencySourceMap…

142daba

… robust The output file map is the source of truth.

Write inputDependencySourceMap to priors as before to minimize risk

445905a

davidungar added the swift 5.5 label Jun 11, 2021

davidungar requested a review from a team as a code owner June 11, 2021 22:03

davidungar marked this pull request as draft June 11, 2021 22:06

davidungar marked this pull request as ready for review June 21, 2021 18:23

davidungar requested a review from artemcm June 21, 2021 18:38

CodaFi requested changes Jun 21, 2021

View reviewed changes

CodaFi reviewed Jun 21, 2021

View reviewed changes

davidungar mentioned this pull request Jun 21, 2021

[5.5, NFC, Incremental] Create a domain-specific type for the inputDependencySourceMap #710

Closed

davidungar closed this Jun 25, 2021

davidungar deleted the 5.5-catchup-2 branch July 2, 2021 22:02

[5.5, Incremental] Harness immutability to make inputDependencySourceMap robust #712

[5.5, Incremental] Harness immutability to make inputDependencySourceMap robust #712

Uh oh!

Conversation

davidungar commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidungar commented Jun 21, 2021

Uh oh!

davidungar commented Jun 21, 2021

Uh oh!

CodaFi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodaFi commented Jun 21, 2021

Uh oh!

davidungar commented Jun 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidungar Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidungar commented Jun 21, 2021

Uh oh!

CodaFi commented Jun 21, 2021

Uh oh!

davidungar commented Jun 21, 2021

Uh oh!

davidungar commented Jun 21, 2021

Uh oh!

davidungar commented Jun 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CodaFi commented Jun 22, 2021

Uh oh!

davidungar commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

davidungar commented Jun 11, 2021 •

edited

Loading

CodaFi left a comment •

edited

Loading

davidungar Jun 22, 2021 •

edited

Loading

davidungar commented Jun 21, 2021 •

edited

Loading

davidungar commented Jun 22, 2021 •

edited

Loading