Skip to content

[Serialization] Improve module loading performance #40155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Nov 12, 2021

When looking for a Swift module on disk, we were scanning all module search paths if they contain the module we are searching for. In a setup where each module is contained in its own framework search path, this scaled quadratically with the number of modules being imported. E.g. a setup with 100 modules being imported form 100 module search paths could cause on the order of 10,000 checks of FileSystem::exists. While these checks are fairly fast (~10µs), they add up to ~100ms.

To improve this, perform a first scan of all module search paths and list the files they contain. From this, create a lookup map that maps filenames to the search paths they can be found in. E.g. for

searchPath1/
  Module1.framework

searchPath2/
  Module1.framework
  Module2.swiftmodule

we create the following lookup table

Module1.framework -> [searchPath1, searchPath2]
Module2.framework -> [searchPath2]

The initial scan is fairly fast (in O(files inside module search paths)) and afterwards finding a module in the framework search paths is in O(1) in the situation described above.

@ahoppen ahoppen requested a review from bnbarham November 12, 2021 10:15
@ahoppen
Copy link
Member Author

ahoppen commented Nov 12, 2021

@swift-ci Please smoke test

@ahoppen ahoppen requested a review from xymus November 12, 2021 10:17
@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch from f9327ac to 6e95156 Compare November 12, 2021 15:07
@ahoppen
Copy link
Member Author

ahoppen commented Nov 12, 2021

@swift-ci Please smoke test

Copy link
Contributor

@xymus xymus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like an overdue improvement, thanks!

Copy link
Contributor

@bnbarham bnbarham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Those microseconds really add up 😆

@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch from 6e95156 to 3a0106e Compare November 16, 2021 18:07
@ahoppen
Copy link
Member Author

ahoppen commented Nov 16, 2021

OK, I needed to do a pretty big refactoring of the PR because before each module loader maintained its own lookup table. Thus, if one module loader caused a search path to get added, the others wouldn’t know about it. So I moved the entire lookup logic down to SearchPathOpts, which has a better overview of any search path modifications anyway. For that I needed to make sure that all modifications to the module search paths are made through dedicated setters that can update/invalidate the lookup.

@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch 2 times, most recently from 738e9b7 to 4acc427 Compare November 16, 2021 20:52
!Error && Dir != llvm::vfs::directory_iterator(); Dir.increment(Error)) {
StringRef Filename = llvm::sys::path::filename(Dir->path());
LookupTable[Filename].push_back(
{SearchPath.str(), Kind, IsSystem, SearchPathIndex});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we store the results of getDarwinImplicitFrameworkSearchPaths in SearchPathOptions this could then be a StringRef (I think). Possibly worth it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current design, we only decide whether we need these paths (i.e. whether we are on a Darwin platform) when enumerating the search paths. So if we want to store the Darwin implicit framework search paths in SearchPathOptions, we would need to always compute them when setting SDK path, regardless of the platform (or have a slightly more complicated caching method). I’m not sure if it’s worth it. My current gut feeling is that this function will be called <10 times per compilation/SourceKit operation.

The clean solution would be that SearchPathOptions knows whether it’s on a Darwin platform, but I think that’s out of scope for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seemed like a relatively large map to me, but I guess there's not that many file -> search paths. I was mostly hoping we could move the platform check into ParseSearchPathArgs and just add the extra paths directly to the framework search paths rather than have separate field, but I haven't looked into that at all. If that's not possible then I don't really have a preference either way 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. I misunderstood you. I though you wanted to avoid the computation of getDarwinImplicitFrameworkSearchPaths. Making ModuleSearchPath only contain a StringRef does seem like a worthwhile improvement 👍

I was mostly hoping we could move the platform check into ParseSearchPathArgs and just add the extra paths directly to the framework search paths rather than have separate field, but I haven't looked into that at all. If that's not possible then I don't really have a preference either way 🤷

We can’t do that because framework search paths may be added after the compiler invocation is created and those should come before the Darwin implicit search paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yeah the computation didn't really bother me at all - from a quick look this just seemed to be the only part that was preventing using StringRefs.

for (auto path : searchPathOpts.ImportSearchPaths) {
for (auto path : searchPathOpts.getImportSearchPaths()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you only changed to the getter here, but it's always nice to be explicit with const auto &. There's a few others where the same applies.

isFramework = false;

// On Apple platforms, we can assume that the runtime libraries use
// target-specifi module files wihtin a `.swiftmodule` directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't actually change any of this but:
target-specifi -> target-specific
wihtin -> within

Comment on lines 76 to 91
// Gather all search paths that include a file whose name is in Filenames.
// To make sure that we don't include search paths twice, keep track of which
// search paths have already been added to Result by their kind and Index
// in ResultIds.
llvm::SmallVector<const ModuleSearchPath *, 4> Result;
llvm::SmallSet<std::pair<ModuleSearchPathKind, unsigned>, 4> ResultIds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still add a search path twice now right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if it’s specified twice on the command line (and not cleaned up earlier). Is that good or bad or are you just wondering? Search paths could also exist twice as different kinds, e.g. once as framework search path and once as import search path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly just pointing out To make sure that we don't include search paths twice but it is clarified later in that sentence so 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. Good catch.

@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch 5 times, most recently from 513885a to 67efb25 Compare November 22, 2021 20:09
@ahoppen
Copy link
Member Author

ahoppen commented Nov 22, 2021

swiftlang/llvm-project#3546

@swift-ci Please smoke test

@ahoppen
Copy link
Member Author

ahoppen commented Dec 1, 2021

@swift-ci Please test source compatibility

@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch from 67efb25 to e6634da Compare December 8, 2021 21:55
@ahoppen
Copy link
Member Author

ahoppen commented Dec 8, 2021

@swift-ci Please smoke test

@ahoppen
Copy link
Member Author

ahoppen commented Dec 10, 2021

swiftlang/llvm-project#3546

@swift-ci Please smoke test

@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch 2 times, most recently from 89180c6 to bb55152 Compare December 13, 2021 14:30
@ahoppen
Copy link
Member Author

ahoppen commented Dec 13, 2021

swiftlang/llvm-project#3546

@swift-ci Please smoke test

When looking for a Swift module on disk, we were scanning all module search paths if they contain the module we are searching for. In a setup where each module is contained in its own framework search path, this scaled quadratically with the number of modules being imported. E.g. a setup with 100 modules being imported form 100 module search paths could cause on the order of 10,000 checks of `FileSystem::exists`. While these checks are fairly fast (~10µs), they add up to ~100ms.

To improve this, perform a first scan of all module search paths and list the files they contain. From this, create a lookup map that maps filenames to the search paths they can be found in. E.g. for
```
searchPath1/
  Module1.framework

searchPath2/
  Module1.framework
  Module2.swiftmodule
```
we create the following lookup table
```
Module1.framework -> [searchPath1, searchPath2]
Module2.swiftmodule -> [searchPath2]
```
@ahoppen ahoppen force-pushed the pr/improve-module-search-path-lookup branch from bb55152 to fe7878e Compare December 14, 2021 11:44
@ahoppen
Copy link
Member Author

ahoppen commented Dec 14, 2021

swiftlang/llvm-project#3546

@swift-ci Please smoke test

@ahoppen ahoppen merged commit 669e3f3 into swiftlang:main Dec 20, 2021
@ahoppen ahoppen deleted the pr/improve-module-search-path-lookup branch December 20, 2021 17:09
@nkcsgexi
Copy link
Contributor

@ahoppen sorry that I have to revert it despite this is a very nice change: #40663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants