Skip to content

[clang][deps] Overload Filesystem::exists in DependencyScanningFilesystem to have it use cached status #88152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

artemcm
Copy link
Contributor

@artemcm artemcm commented Apr 9, 2024

As-is, calls to exists() fallback on the implementation in ProxyFileSystem::exists which explicitly calls out to the underlying FS, which for the DependencyScanningFilesystem (overlay) is the real underlying filesystem.

Instead, directly overloading exists allows us to have it rely on the cached status behavior used elsewhere by the DependencyScanningFilesystem.

Copy link

github-actions bot commented Apr 9, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 9, 2024

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-clang

Author: Artem Chikin (artemcm)

Changes

As-is, calls to exists() fallback on the implementation in ProxyFileSystem::exists which explicitly calls out to the underlying FS, which for the DependencyScanningFilesystem (overlay) is the real underlying filesystem.

Instead, directly overloading exists allows us to have it rely on the cached status behavior used elsewhere by the DependencyScanningFilesystem.


Full diff: https://github.com/llvm/llvm-project/pull/88152.diff

2 Files Affected:

  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+4)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+6)
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index 9a522a3e2fe252..4fdfebada5b7f7 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -365,6 +365,10 @@ class DependencyScanningWorkerFilesystem
     return LocalCache.insertEntryForFilename(Filename, Entry);
   }
 
+  /// Check whether \p Path exists. By default checks cached result of \c status(),
+  /// and falls back on FS if unable to do so.
+  bool exists(const Twine &Path) override;
+
   /// Returns entry associated with the filename in the shared cache if there is
   /// some. Otherwise, constructs new one with the given error code, associates
   /// it with the filename and returns the result.
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 9b7812a1adb9e3..174edc98da5877 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -270,6 +270,12 @@ DependencyScanningWorkerFilesystem::status(const Twine &Path) {
   return Result->getStatus();
 }
 
+bool
+DependencyScanningWorkerFilesystem::exists(const Twine &Path) {
+  auto Status = status(Path);
+  return Status && Status->exists();
+}
+
 namespace {
 
 /// The VFS that is used by clang consumes the \c CachedFileSystemEntry using

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch from 8be194a to 0ab9e02 Compare April 9, 2024 18:12
@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch 2 times, most recently from e73e72a to c094e57 Compare April 9, 2024 19:40
Copy link
Member

@aganea aganea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! Quite interestingly I was just looking precisely at this issue today 😄 Things are worst on Windows, where mini-filter drivers kick-in and bring performance to its knees. I'll try your patch, see how that improves my usage!

image

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch from c094e57 to 1ee923c Compare April 9, 2024 20:35
@aganea
Copy link
Member

aganea commented Apr 9, 2024

This patch doesn`t improve my usage, it seems I'm hitting a different codepath than you do. I'll investigate.

@aganea
Copy link
Member

aganea commented Apr 10, 2024

This patch doesn`t improve my usage, it seems I'm hitting a different codepath than you do. I'll investigate.

Ah I see, the commit b768a8c didn't make it in time for the 18.x branch. The issue I'm seeing here is with regular C++ code, no PCH, no modules. It is related to shouldCacheStatFailures() which currently in release/18.x happens to return false for directories. The searching behavior for #include files with clang-cl (not sure about plain clang) is essentially to query every include path passed on the command-line. That creates lots of OS status() calls, like shown in my profile above. Going back to the previous behavior as on main solves my issue, ie:

static bool shouldCacheStatFailures(StringRef Filename) {
  StringRef Ext = llvm::sys::path::extension(Filename);
  if (Ext.empty())
    return false; // This may be the module cache directory.
  return true;
}

As an order of magnitude, the Unreal Engine target I'm testing with clang-scan-deps consists of 17,801 TUs, generating 1,2 billion status() calls in total (by counting the calls to FileManager::getStatValue()) out of which 8,3 million are real OS calls after the change above (by counting the calls to llvm::sys::windows::status()).

Not sure @jansvoboda11 perhaps if we want to cherry pick b768a8c on release/18.x? Or should we land just a simple PR with just the function change above?

Copy link
Member

@aganea aganea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise this PR LGTM.

@jansvoboda11
Copy link
Contributor

Not sure @jansvoboda11 perhaps if we want to cherry pick b768a8c on release/18.x? Or should we land just a simple PR with just the function change above?

I can try pulling b768a8c into the release branch.

@jansvoboda11
Copy link
Contributor

I'd like to see a unit test specific to DependencyScanningFilesystem, similar to what I have here: #68645.

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch from 1ee923c to 8a2c67f Compare April 10, 2024 22:22
@artemcm
Copy link
Contributor Author

artemcm commented Apr 10, 2024

I'd like to see a unit test specific to DependencyScanningFilesystem, similar to what I have here: #68645.

Done! Added one in DependencyScanningFilesystemTest.cpp. Thank you.

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would be nice to land Ben's change separately.

@jansvoboda11
Copy link
Contributor

Seems like some tests failed on Linux.


DepFS.exists("/foo");
DepFS.exists("/bar");
EXPECT_EQ(InstrumentingFS->NumStatCalls, 2u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, doesn't this test pass even prior to this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, was silly to use a ProxyFilesystem as the underlying FS. Fixed, thank you very much for catching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to explicitly check that we're not calling exists() instead of relying on the fact that InMemoryFileSystem keeps implementing exists() in terms of status().

@aganea
Copy link
Member

aganea commented Apr 11, 2024

Not sure @jansvoboda11 perhaps if we want to cherry pick b768a8c on release/18.x? Or should we land just a simple PR with just the function change above?

I can try pulling b768a8c into the release branch.

Sorry I've been misleading, pushing b768a8c on 18.x doesn't fix the issue. The problem is around the directory queries and the function in b768a8c returns false for that case (thus no caching of errors).

Running with:

static bool shouldCacheStatFailures(StringRef Filename) {
  StringRef Ext = llvm::sys::path::extension(Filename);
  if (Ext.empty())
    return false; // This may be the module cache directory.
  return true;
}

Takes 4 min 8 sec.

Then running with:

static bool shouldCacheStatFailures(StringRef Filename) {
  return true;
}

Takes 1 min 47 sec.

I think something more involved would be needed here. @jansvoboda11 in which case we don't want to consider the file system immutable during the execution of clang-scan-deps?

@jansvoboda11
Copy link
Contributor

@aganea Ah, got it. Unfortunately, caching stat failures for all directories doesn't work for modules. Clang is supposed to create the modules cache directory if one doesn't exist. But if we first cache its non-existence, Clang will never see it again, even after Clang itself created it.

I think we could turn off caching of stat failures only for the modules cache directory by giving the build system an API to configure DependencyScanningService with the modules cache directory, wire up all DependencyScanningWorkerFilesystem to only avoid caching stat failures for that path, and have all DependencyScanningWorker overwrite the -fmodules-cache-path=<path> arguments with the configured path.

In the meantime, are you able to work around this for your non-modular use-case by applying your change when building your own compiler downstream?

Also, just to be sure, this is not a regression, correct?

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch 3 times, most recently from 1b27958 to 2c94305 Compare April 11, 2024 17:20
@aganea
Copy link
Member

aganea commented Apr 11, 2024

In the meantime, are you able to work around this for your non-modular use-case by applying your change when building your own compiler downstream?

Yes I fixed it in our downstream. I am more worried about other users, since the initial version that @hyp committed a while ago in e1f4c4a didn't suffer of these issues.

Also, just to be sure, this is not a regression, correct?

Not per se, not from the latest commits. It seems the issue was introduced when module support was added in 9ab6d82. Unfortunately there are users like us with big C++ codebases and stuck with regular #includes, no modules for now.

Copy link

github-actions bot commented Apr 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jansvoboda11
Copy link
Contributor

Please apply the code formatting suggestions.

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch from 2c94305 to f1c4624 Compare April 12, 2024 16:50
@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch 2 times, most recently from 8e98fd0 to 24e869d Compare April 12, 2024 20:50
@jansvoboda11
Copy link
Contributor

Clang test now looks good to me. Might be nice to drop InstrumentingInMemoryFilesystem in favor of the existing InstrumentingFilesystem (that I added just moments ago, sorry!) wrapped around a normal InMemoryFileSystem, but I'm happy to do that myself in a follow-up commit.

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch 2 times, most recently from e010a76 to 8164aaf Compare April 12, 2024 21:06
@artemcm
Copy link
Contributor Author

artemcm commented Apr 12, 2024

Clang test now looks good to me. Might be nice to drop InstrumentingInMemoryFilesystem in favor of the existing InstrumentingFilesystem (that I added just moments ago, sorry!) wrapped around a normal InMemoryFileSystem, but I'm happy to do that myself in a follow-up commit.

I had to update the commit anyway. Unified on InstrumentingFilesystem.

@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch 2 times, most recently from 1dd15b3 to b7c6930 Compare April 12, 2024 21:21
…esystem' to have it use cached `status`

As-is, calls to `exists()` fallback on the implementation in 'ProxyFileSystem::exists' which explicitly calls out to the underlying `FS`, which for the 'DependencyScanningFilesystem' (overlay) is the real underlying filesystem.

Instead, directly overloading 'exists' allows us to have it rely on the cached `status` behavior used elsewhere by the 'DependencyScanningFilesystem'.
@artemcm artemcm force-pushed the artemcm/DependencyScanningFilesystemExistsOverload branch from b7c6930 to 1989bbb Compare April 12, 2024 21:26
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@jansvoboda11 jansvoboda11 merged commit 779ba60 into llvm:main Apr 12, 2024
Copy link

@artemcm Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

jansvoboda11 pushed a commit to swiftlang/llvm-project that referenced this pull request Apr 12, 2024
…esystem` to have it use cached `status` (llvm#88152)

As-is, calls to `exists()` fallback on the implementation in
`ProxyFileSystem::exists` which explicitly calls out to the underlying
`FS`, which for the `DependencyScanningFilesystem` (overlay) is the real
underlying filesystem.

Instead, directly overloading `exists` allows us to have it rely on the
cached `status` behavior used elsewhere by the
`DependencyScanningFilesystem`.

(cherry picked from commit 779ba60)
bazuzi pushed a commit to bazuzi/llvm-project that referenced this pull request Apr 15, 2024
…esystem` to have it use cached `status` (llvm#88152)

As-is, calls to `exists()` fallback on the implementation in
`ProxyFileSystem::exists` which explicitly calls out to the underlying
`FS`, which for the `DependencyScanningFilesystem` (overlay) is the real
underlying filesystem.

Instead, directly overloading `exists` allows us to have it rely on the
cached `status` behavior used elsewhere by the
`DependencyScanningFilesystem`.
artemcm added a commit to swiftlang/llvm-project that referenced this pull request Apr 16, 2024
…esystem` to have it use cached `status` (llvm#88152)

As-is, calls to `exists()` fallback on the implementation in
`ProxyFileSystem::exists` which explicitly calls out to the underlying
`FS`, which for the `DependencyScanningFilesystem` (overlay) is the real
underlying filesystem.

Instead, directly overloading `exists` allows us to have it rely on the
cached `status` behavior used elsewhere by the
`DependencyScanningFilesystem`.

(cherry picked from commit 779ba60)
aganea added a commit that referenced this pull request Apr 25, 2024
…n hot code paths in `FileManager`. (#88427)

`FileManager::getDirectoryRef()` and `FileManager::getFileRef()` are hot code paths in `clang-scan-deps`. These functions are updating on every call a few atomics related to printing statistics, which causes contention on high core count machines.

![Screenshot 2024-04-10
214123](https://github.com/llvm/llvm-project/assets/37383324/5756b1bc-cab5-4612-8769-ee7e03a66479)

![Screenshot 2024-04-10
214246](https://github.com/llvm/llvm-project/assets/37383324/3d560e89-61c7-4fb9-9330-f9e660e8fc8b)

![Screenshot 2024-04-10
214315](https://github.com/llvm/llvm-project/assets/37383324/006341fc-49d4-4720-a348-7af435c21b17)

After this patch we make the variables local to the `FileManager`.

In our test case, this saves about 49 sec over 1 min 47 sec of `clang-scan-deps` run time (1 min 47 sec before, 58 sec after). These figures are after applying my suggestion in #88152 (comment), that is:
```
static bool shouldCacheStatFailures(StringRef Filename) {
  return true;
}
```
Without the above, there's just too much OS noise from the high volume of `status()` calls with regular non-modules C++ code. Tested on Windows with clang-cl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category llvm:support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants