Skip to content

[Dependency Scanning] Refactor the scanner to resolve unqualified module imports #62696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 6, 2023

Conversation

artemcm
Copy link
Contributor

@artemcm artemcm commented Dec 19, 2022

Apologies for the size of this PR, the changes here are somewhat-intertwined refactoring of how the scanner works overall.

This patch changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.).

The ModuleDependencyInfo objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies.

Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports to what kind of module they are (Swift, Clang, etc), which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info.

Moreover, looking up a given module dependency by name (via ASTContext's getModuleDependencies)(either for the first time, or to qualify its kind later) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem.

This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows:
Suppose we had previously discovered a Clang module foo and cached its dependency info.
-> ASTContext.getModuleDependencies("foo")
---> (1) Swift Module "Loader" checks caches for a Swift module foo and doesn't find one, so it searches the filesystem for foo and fails to find one there also.
---> (2) Clang Module "Loader" checks caches for a Clang module foo, finds one and returns it to the client.

This means that we were always searching the filesystem in (1) even if we knew all the information required to deem that to be futile.
With this change, queries to ASTContext's getModuleDependencies will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents.

To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits getModuleDependencies into an already-existing getSwiftModuleDependencies and a newly-added getClangModuleDependencies.

Note, in expectation this change does not alter the scanner's functionality. The test changes here are mostly to make the tests more resilient to the ordering in which dependency information is emitted.

@artemcm artemcm force-pushed the BetterScannerDependencyResolution branch 10 times, most recently from 0ccb933 to 5334af0 Compare December 20, 2022 00:54
@artemcm
Copy link
Contributor Author

artemcm commented Dec 20, 2022

@swift-ci test

@artemcm artemcm force-pushed the BetterScannerDependencyResolution branch from 5334af0 to 6f15a3d Compare December 20, 2022 17:37
@artemcm
Copy link
Contributor Author

artemcm commented Dec 20, 2022

@swift-ci test

@artemcm
Copy link
Contributor Author

artemcm commented Dec 20, 2022

I have a test project with a large target with 1665 .swift source files and a total of 166 modules in the dependency graph, and scanning the full target with this patch is, on average, 30% faster.

@artemcm artemcm marked this pull request as ready for review December 20, 2022 18:50
Copy link
Contributor

@nkcsgexi nkcsgexi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Hoisting the cache query above all importers makes a lot of sense.

@artemcm artemcm requested a review from elsh December 20, 2022 22:16

/// Record dependencies for the given module.
const ModuleDependencyInfo *recordDependencies(StringRef moduleName,
const ModuleDependencyInfo *recordDependency(StringRef moduleName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm not quite following is the de-pluralization of these method names. Aren't there potentially multiple dependencies of moduleID in the given ModuleDependencyInfo collection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there are, indeed, multiple dependencies of moduleID in the given ModuleDependencyInfo, the thing being recorded here is the existence of a ModuleDependencyInfo itself for this moduleID. Which does contain a collection of its own dependencies' IDs, each of which will then correspond to a ModuleDependencyInfo to be queried via these same API, but also a bunch of other metadata we need to know about this dependency.

I can see the logic behind the pluralization but it's been bugging me for a while and it seems to be clearer to be explicit about the fact that these API are vending these info values that describe a particular module, not just that module's dependencies.

For example, before this patch hasDependencies("foo") would indicate whether or not a module "foo" has been discovered as a dependency during this scan. To your point, it does also indicate that we know dependencies of foo in its respective info, but the use of these queries is broader than querying dependencies of this module as it also covers getting a dependency info about this module for other purposes as well. It seems conceptually more-accurate to have some thing like findDependencies("foo") return an actual list of dependencies of foo, rather than an info describing foo as a discovered dependency itself.

I'm having to really make an argument here so I can see it either way and can be convinced otherwise, but this seemed more appropriate as I was already restructuring things!

@artemcm artemcm force-pushed the BetterScannerDependencyResolution branch from 6f15a3d to 5ee17bc Compare December 21, 2022 00:18
@artemcm
Copy link
Contributor Author

artemcm commented Dec 21, 2022

@swift-ci test

@artemcm artemcm force-pushed the BetterScannerDependencyResolution branch from 5ee17bc to 311df57 Compare December 21, 2022 17:55
@artemcm
Copy link
Contributor Author

artemcm commented Dec 21, 2022

@swift-ci test

@artemcm
Copy link
Contributor Author

artemcm commented Dec 21, 2022

@swift-ci test macOS platform

1 similar comment
@artemcm
Copy link
Contributor Author

artemcm commented Dec 22, 2022

@swift-ci test macOS platform

@artemcm
Copy link
Contributor Author

artemcm commented Dec 22, 2022

@swift-ci test macOS platform

@artemcm artemcm force-pushed the BetterScannerDependencyResolution branch from 311df57 to 1343254 Compare December 22, 2022 20:44
@artemcm
Copy link
Contributor Author

artemcm commented Dec 22, 2022

@swift-ci test

…ule imports

This changes the scanner's behavior to "resolve" a discovered module's dependencies to a set of Module IDs: module name + module kind (swift textual, swift binary, clang, etc.).

The 'ModuleDependencyInfo' objects that are stored in the dependency scanner's cache now carry a set of kind-qualified ModuleIDs for their dependencies, in addition to unqualified imported module names of their dependencies.

Previously, the scanner's internal state would cache a module dependnecy as having its own set of dependencies which were stored as names of imported modules. This led to a design where any time we needed to process the dependency downstream from its discovery (e.g. cycle detection, graph construction), we had to query the ASTContext to resolve this dependency's imports, which shouldn't be necessary. Now, upon discovery, we "resolve" a discovered dependency by executing a lookup for each of its imported module names (this operation happens regardless of this patch) and store a fully-resolved set of dependencies in the dependency module info.

Moreover, looking up a given module dependency by name (via `ASTContext`'s `getModuleDependencies`) would result in iterating over the scanner's module "loaders" and querying each for the module name. The corresponding modules would then check the scanner's cache for a respective discovered module, and if no such module is found the "loader" would search the filesystem.

This meant that in practice, we searched the filesystem on many occasions where we actually had cached the required dependency, as follows:
Suppose we had previously discovered a Clang module "foo" and cached its dependency info.
-> ASTContext.getModuleDependencies("foo")
--> (1) Swift Module "Loader" checks caches for a Swift module "foo" and doesn't find one, so it searches the filesystem for "foo" and fails to find one.
--> (2) Clang Module "Loader" checks caches for a Clang module "foo", finds one and returns it to the client.

This means that we were always searching the filesystem in (1) even if we knew that to be futile.
With this change, queries to `ASTContext`'s `getModuleDependencies` will always check all the caches first, and only delegate to the scanner "loaders" if no cached dependency is found. The loaders are then no longer in the business of checking the cached contents.

To handle cases in the scanner where we must only lookup either a Swift-only module or a Clang-only module, this patch splits 'getModuleDependencies' into an alrady-existing 'getSwiftModuleDependencies' and a newly-added 'getClangModuleDependencies'.
@artemcm artemcm force-pushed the BetterScannerDependencyResolution branch from 1343254 to 12477b7 Compare January 5, 2023 19:44
@artemcm
Copy link
Contributor Author

artemcm commented Jan 5, 2023

@swift-ci test

@artemcm artemcm merged commit b205a9c into swiftlang:main Jan 6, 2023
@artemcm artemcm deleted the BetterScannerDependencyResolution branch January 6, 2023 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants