-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang][modules] Timestamp PCM files when writing #112452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-modules Author: Jan Svoboda (jansvoboda11) ChangesClang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO. This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session. I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that. I've seen this speed up Full diff: https://github.com/llvm/llvm-project/pull/112452.diff 6 Files Affected:
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 30e7f6b3e57bd8..29d3354d07a129 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -15,6 +15,7 @@
#define LLVM_CLANG_SERIALIZATION_MODULEFILE_H
#include "clang/Basic/FileManager.h"
+#include "clang/Basic/LLVM.h"
#include "clang/Basic/Module.h"
#include "clang/Basic/SourceLocation.h"
#include "clang/Serialization/ASTBitCodes.h"
@@ -144,8 +145,8 @@ class ModuleFile {
/// The base directory of the module.
std::string BaseDirectory;
- std::string getTimestampFilename() const {
- return FileName + ".timestamp";
+ static std::string getTimestampFilename(StringRef FileName) {
+ return (FileName + ".timestamp").str();
}
/// The original source file name that was used to build the
diff --git a/clang/lib/Serialization/ASTCommon.cpp b/clang/lib/Serialization/ASTCommon.cpp
index ab4923de6346f8..ec18e84255ca8e 100644
--- a/clang/lib/Serialization/ASTCommon.cpp
+++ b/clang/lib/Serialization/ASTCommon.cpp
@@ -15,7 +15,10 @@
#include "clang/AST/DeclObjC.h"
#include "clang/Basic/IdentifierTable.h"
#include "clang/Serialization/ASTDeserializationListener.h"
+#include "clang/Serialization/ModuleFile.h"
#include "llvm/Support/DJB.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
using namespace clang;
@@ -503,3 +506,15 @@ bool serialization::needsAnonymousDeclarationNumber(const NamedDecl *D) {
return false;
return isa<TagDecl, FieldDecl>(D);
}
+
+void serialization::updateModuleTimestamp(StringRef ModuleFilename) {
+ // Overwrite the timestamp file contents so that file's mtime changes.
+ std::error_code EC;
+ llvm::raw_fd_ostream OS(ModuleFile::getTimestampFilename(ModuleFilename), EC,
+ llvm::sys::fs::OF_TextWithCRLF);
+ if (EC)
+ return;
+ OS << "Timestamp file\n";
+ OS.close();
+ OS.clear_error(); // Avoid triggering a fatal error.
+}
diff --git a/clang/lib/Serialization/ASTCommon.h b/clang/lib/Serialization/ASTCommon.h
index 0230908d3e0528..2a765eafe08951 100644
--- a/clang/lib/Serialization/ASTCommon.h
+++ b/clang/lib/Serialization/ASTCommon.h
@@ -15,6 +15,7 @@
#include "clang/AST/ASTContext.h"
#include "clang/AST/DeclFriend.h"
+#include "clang/Basic/LLVM.h"
#include "clang/Serialization/ASTBitCodes.h"
namespace clang {
@@ -100,6 +101,8 @@ inline bool isPartOfPerModuleInitializer(const Decl *D) {
return false;
}
+void updateModuleTimestamp(StringRef ModuleFilename);
+
} // namespace serialization
} // namespace clang
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 1b2473f2457344..f6e584197e2cf7 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4415,19 +4415,6 @@ bool ASTReader::isGlobalIndexUnavailable() const {
!hasGlobalIndex() && TriedLoadingGlobalIndex;
}
-static void updateModuleTimestamp(ModuleFile &MF) {
- // Overwrite the timestamp file contents so that file's mtime changes.
- std::string TimestampFilename = MF.getTimestampFilename();
- std::error_code EC;
- llvm::raw_fd_ostream OS(TimestampFilename, EC,
- llvm::sys::fs::OF_TextWithCRLF);
- if (EC)
- return;
- OS << "Timestamp file\n";
- OS.close();
- OS.clear_error(); // Avoid triggering a fatal error.
-}
-
/// Given a cursor at the start of an AST file, scan ahead and drop the
/// cursor into the start of the given block ID, returning false on success and
/// true on failure.
@@ -4706,7 +4693,7 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef FileName, ModuleKind Type,
ImportedModule &M = Loaded[I];
if (M.Mod->Kind == MK_ImplicitModule &&
M.Mod->InputFilesValidationTimestamp < HSOpts.BuildSessionTimestamp)
- updateModuleTimestamp(*M.Mod);
+ updateModuleTimestamp(M.Mod->FileName);
}
}
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 938d7b525cb959..c0fb7a0591be02 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -4905,6 +4905,10 @@ ASTFileSignature ASTWriter::WriteAST(Sema &SemaRef, StringRef OutputFile,
this->BaseDirectory.clear();
WritingAST = false;
+
+ if (WritingModule)
+ updateModuleTimestamp(OutputFile);
+
if (ShouldCacheASTInMemory) {
// Construct MemoryBuffer and update buffer manager.
ModuleCache.addBuiltPCM(OutputFile,
diff --git a/clang/lib/Serialization/ModuleManager.cpp b/clang/lib/Serialization/ModuleManager.cpp
index e74a16b6368028..ba78c9ef5af67f 100644
--- a/clang/lib/Serialization/ModuleManager.cpp
+++ b/clang/lib/Serialization/ModuleManager.cpp
@@ -170,7 +170,8 @@ ModuleManager::addModule(StringRef FileName, ModuleKind Type,
NewModule->InputFilesValidationTimestamp = 0;
if (NewModule->Kind == MK_ImplicitModule) {
- std::string TimestampFilename = NewModule->getTimestampFilename();
+ std::string TimestampFilename =
+ ModuleFile::getTimestampFilename(NewModule->FileName);
llvm::vfs::Status Status;
// A cached stat value would be fine as well.
if (!FileMgr.getNoncachedStatValue(TimestampFilename, Status))
|
@llvm/pr-subscribers-clang Author: Jan Svoboda (jansvoboda11) ChangesClang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO. This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session. I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that. I've seen this speed up Full diff: https://github.com/llvm/llvm-project/pull/112452.diff 6 Files Affected:
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 30e7f6b3e57bd8..29d3354d07a129 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -15,6 +15,7 @@
#define LLVM_CLANG_SERIALIZATION_MODULEFILE_H
#include "clang/Basic/FileManager.h"
+#include "clang/Basic/LLVM.h"
#include "clang/Basic/Module.h"
#include "clang/Basic/SourceLocation.h"
#include "clang/Serialization/ASTBitCodes.h"
@@ -144,8 +145,8 @@ class ModuleFile {
/// The base directory of the module.
std::string BaseDirectory;
- std::string getTimestampFilename() const {
- return FileName + ".timestamp";
+ static std::string getTimestampFilename(StringRef FileName) {
+ return (FileName + ".timestamp").str();
}
/// The original source file name that was used to build the
diff --git a/clang/lib/Serialization/ASTCommon.cpp b/clang/lib/Serialization/ASTCommon.cpp
index ab4923de6346f8..ec18e84255ca8e 100644
--- a/clang/lib/Serialization/ASTCommon.cpp
+++ b/clang/lib/Serialization/ASTCommon.cpp
@@ -15,7 +15,10 @@
#include "clang/AST/DeclObjC.h"
#include "clang/Basic/IdentifierTable.h"
#include "clang/Serialization/ASTDeserializationListener.h"
+#include "clang/Serialization/ModuleFile.h"
#include "llvm/Support/DJB.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
using namespace clang;
@@ -503,3 +506,15 @@ bool serialization::needsAnonymousDeclarationNumber(const NamedDecl *D) {
return false;
return isa<TagDecl, FieldDecl>(D);
}
+
+void serialization::updateModuleTimestamp(StringRef ModuleFilename) {
+ // Overwrite the timestamp file contents so that file's mtime changes.
+ std::error_code EC;
+ llvm::raw_fd_ostream OS(ModuleFile::getTimestampFilename(ModuleFilename), EC,
+ llvm::sys::fs::OF_TextWithCRLF);
+ if (EC)
+ return;
+ OS << "Timestamp file\n";
+ OS.close();
+ OS.clear_error(); // Avoid triggering a fatal error.
+}
diff --git a/clang/lib/Serialization/ASTCommon.h b/clang/lib/Serialization/ASTCommon.h
index 0230908d3e0528..2a765eafe08951 100644
--- a/clang/lib/Serialization/ASTCommon.h
+++ b/clang/lib/Serialization/ASTCommon.h
@@ -15,6 +15,7 @@
#include "clang/AST/ASTContext.h"
#include "clang/AST/DeclFriend.h"
+#include "clang/Basic/LLVM.h"
#include "clang/Serialization/ASTBitCodes.h"
namespace clang {
@@ -100,6 +101,8 @@ inline bool isPartOfPerModuleInitializer(const Decl *D) {
return false;
}
+void updateModuleTimestamp(StringRef ModuleFilename);
+
} // namespace serialization
} // namespace clang
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 1b2473f2457344..f6e584197e2cf7 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4415,19 +4415,6 @@ bool ASTReader::isGlobalIndexUnavailable() const {
!hasGlobalIndex() && TriedLoadingGlobalIndex;
}
-static void updateModuleTimestamp(ModuleFile &MF) {
- // Overwrite the timestamp file contents so that file's mtime changes.
- std::string TimestampFilename = MF.getTimestampFilename();
- std::error_code EC;
- llvm::raw_fd_ostream OS(TimestampFilename, EC,
- llvm::sys::fs::OF_TextWithCRLF);
- if (EC)
- return;
- OS << "Timestamp file\n";
- OS.close();
- OS.clear_error(); // Avoid triggering a fatal error.
-}
-
/// Given a cursor at the start of an AST file, scan ahead and drop the
/// cursor into the start of the given block ID, returning false on success and
/// true on failure.
@@ -4706,7 +4693,7 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef FileName, ModuleKind Type,
ImportedModule &M = Loaded[I];
if (M.Mod->Kind == MK_ImplicitModule &&
M.Mod->InputFilesValidationTimestamp < HSOpts.BuildSessionTimestamp)
- updateModuleTimestamp(*M.Mod);
+ updateModuleTimestamp(M.Mod->FileName);
}
}
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 938d7b525cb959..c0fb7a0591be02 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -4905,6 +4905,10 @@ ASTFileSignature ASTWriter::WriteAST(Sema &SemaRef, StringRef OutputFile,
this->BaseDirectory.clear();
WritingAST = false;
+
+ if (WritingModule)
+ updateModuleTimestamp(OutputFile);
+
if (ShouldCacheASTInMemory) {
// Construct MemoryBuffer and update buffer manager.
ModuleCache.addBuiltPCM(OutputFile,
diff --git a/clang/lib/Serialization/ModuleManager.cpp b/clang/lib/Serialization/ModuleManager.cpp
index e74a16b6368028..ba78c9ef5af67f 100644
--- a/clang/lib/Serialization/ModuleManager.cpp
+++ b/clang/lib/Serialization/ModuleManager.cpp
@@ -170,7 +170,8 @@ ModuleManager::addModule(StringRef FileName, ModuleKind Type,
NewModule->InputFilesValidationTimestamp = 0;
if (NewModule->Kind == MK_ImplicitModule) {
- std::string TimestampFilename = NewModule->getTimestampFilename();
+ std::string TimestampFilename =
+ ModuleFile::getTimestampFilename(NewModule->FileName);
llvm::vfs::Status Status;
// A cached stat value would be fine as well.
if (!FileMgr.getNoncachedStatValue(TimestampFilename, Status))
|
@@ -4905,6 +4905,10 @@ ASTFileSignature ASTWriter::WriteAST(Sema &SemaRef, StringRef OutputFile, | |||
this->BaseDirectory.clear(); | |||
|
|||
WritingAST = false; | |||
|
|||
if (WritingModule) | |||
updateModuleTimestamp(OutputFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this here is a bit annoying, since it will create timestamp files even for explicitly-built PCM files. I might move it somewhere into CompilerInstance
where it specifically handle implicit builds. That will increase the delay between accessing the input files and writing the timestamp file. What do people think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd like to make such changes doesn't affect C++20 modules if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a strong opinion about the location of updating the timestamp.
To be consistent with another usage of updateModuleTimestamp
shouldn't you check
HeaderSearchOptions &HSOpts = PP.getHeaderSearchInfo().getHeaderSearchOpts();
if (HSOpts.ModulesValidateOncePerBuildSession)
? And I hope it can help with the explicitly-built .pcm files.
How should the current approach with the timestamp files work? I.e. when the file is written, when it is used to short cut the validation. From the description it's unclear it the current approach saves any work at all. If there is a bug in the current approach, it can be better to fix it instead of adding a new mechanism on top of it. |
The issue with the current state of things in Clang is that if you have N instances waiting for a PCM file to be built, they all load the PCM file as soon as the lock is released by the writer. They all find out that there's no timestamp file and attempt to write it after they validate the input files. This is what causes lots of concurrent IO. If you instead write the timestamp file once while still holding the writer lock, the N waiting Clang instances don't attempt to write the timestamp file, because one has already been written in the current build session. This ends up being much faster by alleviating lock contention in the kernel. The logic that avoids unnecessary validation of system headers when the timestamp file exists works just fine before and after this patch. It's just the process of creating the timestamp file that's the issue here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change makes sense. For example, for just-built .o files we assume they are valid and up-to-date, and don't need any extra verification.
As far as I understand, your change helps with the clean builds. And I believe incremental builds can still hit contention with updating timestamps (I can be wrong, though). Have you checked the performance for incremental builds?
@@ -4905,6 +4905,10 @@ ASTFileSignature ASTWriter::WriteAST(Sema &SemaRef, StringRef OutputFile, | |||
this->BaseDirectory.clear(); | |||
|
|||
WritingAST = false; | |||
|
|||
if (WritingModule) | |||
updateModuleTimestamp(OutputFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a strong opinion about the location of updating the timestamp.
To be consistent with another usage of updateModuleTimestamp
shouldn't you check
HeaderSearchOptions &HSOpts = PP.getHeaderSearchInfo().getHeaderSearchOpts();
if (HSOpts.ModulesValidateOncePerBuildSession)
? And I hope it can help with the explicitly-built .pcm files.
I haven't, since I believe this change doesn't affect incremental builds at all. |
Fair enough. I was thinking that you can achieve the same improvement by using some contentionless-update-mtime function. But I haven't found any and |
Clang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With `-fbuild-session-{file,timestamp}=` and `-fmodules-validate-once-per-build-session` this reduces the number of times a PCM file is checked per "build session". The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO. This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session. I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that. I've seen this speed up `clang-scan-deps` by ~27%.
4aacee4
to
27abdf7
Compare
Clang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With `-fbuild-session-{file,timestamp}=` and `-fmodules-validate-once-per-build-session` this reduces the number of times a PCM file is checked per "build session". The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO. This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session. I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that. I've seen this speed up `clang-scan-deps` by ~27%. (cherry picked from commit 0ffa29f)
) In the past, timestamps used for `-fmodules-validate-once-per-build-session` were found to be a source of contention in the dependency scanner ([D149802](https://reviews.llvm.org/D149802), #112452). This PR is yet another attempt to optimize these. We now make use of the new `ModuleCache` interface to implement the in-process version in terms of atomic `std::time_t` variables rather the mtime attribute on `.timestamp` files.
… PCMs (#137363) In the past, timestamps used for `-fmodules-validate-once-per-build-session` were found to be a source of contention in the dependency scanner ([D149802](https://reviews.llvm.org/D149802), llvm/llvm-project#112452). This PR is yet another attempt to optimize these. We now make use of the new `ModuleCache` interface to implement the in-process version in terms of atomic `std::time_t` variables rather the mtime attribute on `.timestamp` files.
…#137363) In the past, timestamps used for `-fmodules-validate-once-per-build-session` were found to be a source of contention in the dependency scanner ([D149802](https://reviews.llvm.org/D149802), llvm#112452). This PR is yet another attempt to optimize these. We now make use of the new `ModuleCache` interface to implement the in-process version in terms of atomic `std::time_t` variables rather the mtime attribute on `.timestamp` files.
…#137363) In the past, timestamps used for `-fmodules-validate-once-per-build-session` were found to be a source of contention in the dependency scanner ([D149802](https://reviews.llvm.org/D149802), llvm#112452). This PR is yet another attempt to optimize these. We now make use of the new `ModuleCache` interface to implement the in-process version in terms of atomic `std::time_t` variables rather the mtime attribute on `.timestamp` files.
…#137363) In the past, timestamps used for `-fmodules-validate-once-per-build-session` were found to be a source of contention in the dependency scanner ([D149802](https://reviews.llvm.org/D149802), llvm#112452). This PR is yet another attempt to optimize these. We now make use of the new `ModuleCache` interface to implement the in-process version in terms of atomic `std::time_t` variables rather the mtime attribute on `.timestamp` files.
Clang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With
-fbuild-session-{file,timestamp}=
and-fmodules-validate-once-per-build-session
this reduces the number of times a PCM file is checked per "build session".The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO.
This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session.
I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that.
I've seen this speed up
clang-scan-deps
by ~27%.