[Dependency Scanning] Refactor the scanner to resolve unqualified module imports #62696

artemcm · 2022-12-19T18:59:25Z

Apologies for the size of this PR, the changes here are somewhat-intertwined refactoring of how the scanner works overall.

This patch changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.).

The ModuleDependencyInfo objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies.

Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports to what kind of module they are (Swift, Clang, etc), which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info.

Moreover, looking up a given module dependency by name (via ASTContext's getModuleDependencies)(either for the first time, or to qualify its kind later) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem.

This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows:
Suppose we had previously discovered a Clang module foo and cached its dependency info.
-> ASTContext.getModuleDependencies("foo")
---> (1) Swift Module "Loader" checks caches for a Swift module foo and doesn't find one, so it searches the filesystem for foo and fails to find one there also.
---> (2) Clang Module "Loader" checks caches for a Clang module foo, finds one and returns it to the client.

This means that we were always searching the filesystem in (1) even if we knew all the information required to deem that to be futile.
With this change, queries to ASTContext's getModuleDependencies will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents.

To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits getModuleDependencies into an already-existing getSwiftModuleDependencies and a newly-added getClangModuleDependencies.

Note, in expectation this change does not alter the scanner's functionality. The test changes here are mostly to make the tests more resilient to the ordering in which dependency information is emitted.

artemcm · 2022-12-20T01:00:55Z

@swift-ci test

artemcm · 2022-12-20T17:55:47Z

@swift-ci test

artemcm · 2022-12-20T18:46:01Z

I have a test project with a large target with 1665 .swift source files and a total of 166 modules in the dependency graph, and scanning the full target with this patch is, on average, 30% faster.

nkcsgexi

Thanks! Hoisting the cache query above all importers makes a lot of sense.

tshortli · 2022-12-20T22:45:21Z

include/swift/AST/ModuleDependencies.h


  /// Record dependencies for the given module.
-  const ModuleDependencyInfo *recordDependencies(StringRef moduleName,
+  const ModuleDependencyInfo *recordDependency(StringRef moduleName,


One thing I'm not quite following is the de-pluralization of these method names. Aren't there potentially multiple dependencies of moduleID in the given ModuleDependencyInfo collection?

While there are, indeed, multiple dependencies of moduleID in the given ModuleDependencyInfo, the thing being recorded here is the existence of a ModuleDependencyInfo itself for this moduleID. Which does contain a collection of its own dependencies' IDs, each of which will then correspond to a ModuleDependencyInfo to be queried via these same API, but also a bunch of other metadata we need to know about this dependency.

I can see the logic behind the pluralization but it's been bugging me for a while and it seems to be clearer to be explicit about the fact that these API are vending these info values that describe a particular module, not just that module's dependencies.

For example, before this patch hasDependencies("foo") would indicate whether or not a module "foo" has been discovered as a dependency during this scan. To your point, it does also indicate that we know dependencies of foo in its respective info, but the use of these queries is broader than querying dependencies of this module as it also covers getting a dependency info about this module for other purposes as well. It seems conceptually more-accurate to have some thing like findDependencies("foo") return an actual list of dependencies of foo, rather than an info describing foo as a discovered dependency itself.

I'm having to really make an argument here so I can see it either way and can be convinced otherwise, but this seemed more appropriate as I was already restructuring things!

artemcm · 2022-12-21T00:18:05Z

@swift-ci test

artemcm · 2022-12-21T18:33:41Z

@swift-ci test

artemcm · 2022-12-21T21:48:27Z

@swift-ci test macOS platform

artemcm · 2022-12-22T01:55:34Z

@swift-ci test macOS platform

artemcm · 2022-12-22T17:19:36Z

@swift-ci test macOS platform

artemcm · 2022-12-22T21:23:40Z

@swift-ci test

…ule imports This changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.). The 'ModuleDependencyInfo' objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies. Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports, which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info. Moreover, looking up a given module dependency by name (via `ASTContext`'s `getModuleDependencies`) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem. This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows: Suppose we had previously discovered a Clang module "foo" and cached its dependency info. -> ASTContext.getModuleDependencies("foo") --> (1) Swift Module "Loader" checks caches for a Swift module "foo" and doesn't find one, so it searches the filesystem for "foo" and fails to find one. --> (2) Clang Module "Loader" checks caches for a Clang module "foo", finds one and returns it to the client. This means that we were always searching the filesystem in (1) even if we knew that to be futile. With this change, queries to `ASTContext`'s `getModuleDependencies` will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents. To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits 'getModuleDependencies' into an alrady-existing 'getSwiftModuleDependencies' and a newly-added 'getClangModuleDependencies'.

artemcm · 2023-01-05T19:44:53Z

@swift-ci test

artemcm force-pushed the BetterScannerDependencyResolution branch 10 times, most recently from 0ccb933 to 5334af0 Compare December 20, 2022 00:54

artemcm force-pushed the BetterScannerDependencyResolution branch from 5334af0 to 6f15a3d Compare December 20, 2022 17:37

artemcm marked this pull request as ready for review December 20, 2022 18:50

artemcm requested review from zoecarver, hyp and egorzhdan as code owners December 20, 2022 18:50

artemcm requested review from nkcsgexi, xymus and tshortli December 20, 2022 18:50

nkcsgexi approved these changes Dec 20, 2022

View reviewed changes

artemcm requested a review from elsh December 20, 2022 22:16

tshortli approved these changes Dec 20, 2022

View reviewed changes

artemcm force-pushed the BetterScannerDependencyResolution branch from 6f15a3d to 5ee17bc Compare December 21, 2022 00:18

artemcm force-pushed the BetterScannerDependencyResolution branch from 5ee17bc to 311df57 Compare December 21, 2022 17:55

artemcm force-pushed the BetterScannerDependencyResolution branch from 311df57 to 1343254 Compare December 22, 2022 20:44

artemcm force-pushed the BetterScannerDependencyResolution branch from 1343254 to 12477b7 Compare January 5, 2023 19:44

artemcm merged commit b205a9c into swiftlang:main Jan 6, 2023

artemcm deleted the BetterScannerDependencyResolution branch January 6, 2023 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dependency Scanning] Refactor the scanner to resolve unqualified module imports #62696

[Dependency Scanning] Refactor the scanner to resolve unqualified module imports #62696

Uh oh!

artemcm commented Dec 19, 2022 •

edited

Loading

Uh oh!

artemcm commented Dec 20, 2022

Uh oh!

artemcm commented Dec 20, 2022

Uh oh!

artemcm commented Dec 20, 2022 •

edited

Loading

Uh oh!

nkcsgexi left a comment

Uh oh!

tshortli Dec 20, 2022

Uh oh!

artemcm Dec 20, 2022

Uh oh!

artemcm commented Dec 21, 2022

Uh oh!

artemcm commented Dec 21, 2022

Uh oh!

artemcm commented Dec 21, 2022

Uh oh!

artemcm commented Dec 22, 2022

Uh oh!

artemcm commented Dec 22, 2022

Uh oh!

artemcm commented Dec 22, 2022

Uh oh!

artemcm commented Jan 5, 2023

Uh oh!

Uh oh!

[Dependency Scanning] Refactor the scanner to resolve unqualified module imports #62696

[Dependency Scanning] Refactor the scanner to resolve unqualified module imports #62696

Uh oh!

Conversation

artemcm commented Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artemcm commented Dec 20, 2022

Uh oh!

artemcm commented Dec 20, 2022

Uh oh!

artemcm commented Dec 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nkcsgexi left a comment

Choose a reason for hiding this comment

Uh oh!

tshortli Dec 20, 2022

Choose a reason for hiding this comment

Uh oh!

artemcm Dec 20, 2022

Choose a reason for hiding this comment

Uh oh!

artemcm commented Dec 21, 2022

Uh oh!

artemcm commented Dec 21, 2022

Uh oh!

artemcm commented Dec 21, 2022

Uh oh!

artemcm commented Dec 22, 2022

Uh oh!

artemcm commented Dec 22, 2022

Uh oh!

artemcm commented Dec 22, 2022

Uh oh!

artemcm commented Jan 5, 2023

Uh oh!

Uh oh!

artemcm commented Dec 19, 2022 •

edited

Loading

artemcm commented Dec 20, 2022 •

edited

Loading