-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang-tidy] Avoid processing declarations in system headers #128150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-tools-extra Author: Carlos Galvez (carlosgalvezp) ChangesCurrently, clang-tidy processes the entire TranslationUnit, including declarations in system headers. However, the work done in system headers is discarded at the very end when presenting results, unless the SystemHeaders option is active. This is a lot of wasted work, and makes clang-tidy very slow. In comparison, clangd only processes declarations in the main file, and it's claimed to be 10x faster than clang-tidy: https://github.com/lljbash/clangd-tidy To solve this problem, we can apply a similar solution done in clangd into clang-tidy. We do this by changing the traversal scope from the default TranslationUnitDecl, to only contain the top-level declarations that are not part of system headers. We do this by prepending a new ASTConsumer to the list of consumers: this new consumer sets the traversal scope in the ASTContext, which is later used by the MatchASTConsumer. Note: this behavior is not active if the user requests warnings from system headers via the SystemHeaders option. Note2: out of all the unit tests, only one of them fails: readability/identifier-naming-anon-record-fields.cpp This is because the limited traversal scope no longer includes the "IndirectFieldDecl" that appears in the AST when having a global scope anonymous union. I have not found a way to make this one work. However, it does seem like a very niche use case, and the benefits of a 10x faster clang-tidy largely outweigh the false negative now introduced by this patch. This use case is therefore removed from the unit test to make it pass. Note3: I have purposely decided to make this new feature enabled by default, instead of adding a new "opt-in/opt-out" flag. Having a new flag would mean duplicating all our tests to ensure they work in both modes, which would be infeasible. Having it enabled by default allow people to get the benefits immediately. Given that all unit tests pass, the risk for regressions is low. Even if that's the case, the only issue would be false negatives (fewer things are detected), which are much more tolerable than false positives. Credits: original implementation by @njames93, here: https://reviews.llvm.org/D150126 This implementation is simpler in the sense that it does not consider HeaderFilterRegex to filter even further. A follow-up patch could include the functionality if wanted. Fixes #52959 Full diff: https://github.com/llvm/llvm-project/pull/128150.diff 5 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/ClangTidy.cpp b/clang-tools-extra/clang-tidy/ClangTidy.cpp
index 959b11777e88d..6edaea30768b1 100644
--- a/clang-tools-extra/clang-tidy/ClangTidy.cpp
+++ b/clang-tools-extra/clang-tidy/ClangTidy.cpp
@@ -339,6 +339,56 @@ class ClangTidyASTConsumer : public MultiplexConsumer {
void anchor() override {};
};
+/// ASTConsumer that filters top-level declarations that are in system headers,
+/// and sets the AST traversal scope to only cover the declarations in user
+/// headers. This makes all clang-tidy checks avoid spending time processing
+/// declarations in system headers. The results are discarded anyway when
+/// presenting the results.
+class IgnoreSystemHeadersConsumer : public ASTConsumer {
+public:
+ void Initialize(ASTContext &Context) override {
+ // Make sure the main file ID always gets included.
+ Cache.insert(
+ std::make_pair(Context.getSourceManager().getMainFileID(), true));
+ }
+
+ bool HandleTopLevelDecl(DeclGroupRef DG) override {
+ for (Decl *D : DG) {
+ if (shouldKeepDecl(D))
+ Decls.push_back(D);
+ }
+ return true;
+ }
+
+ void HandleTranslationUnit(ASTContext &Ctx) override {
+ Ctx.setTraversalScope(Decls);
+ }
+
+private:
+ llvm::DenseMap<clang::FileID, bool> Cache;
+ std::vector<Decl *> Decls;
+
+ bool shouldKeepDecl(Decl *D) {
+ auto &SM = D->getASTContext().getSourceManager();
+ FileID FID = SM.getDecomposedExpansionLoc(D->getLocation()).first;
+
+ // Invalid file, keep the declaration to be on the safe side
+ if (FID.isInvalid() || FID == FileID::getSentinel())
+ return true;
+
+ // Check the cache
+ auto [Item, Inserted] = Cache.try_emplace(FID, true);
+ if (!Inserted)
+ return Item->second;
+
+ // If not in the cache, check and update
+ SrcMgr::SLocEntry Entry =
+ SM.getLocalSLocEntry(static_cast<unsigned>(FID.getHashValue()));
+ Item->second = !SrcMgr::isSystem(Entry.getFile().getFileCharacteristic());
+ return Item->second;
+ }
+};
+
} // namespace
ClangTidyASTConsumerFactory::ClangTidyASTConsumerFactory(
@@ -449,6 +499,10 @@ ClangTidyASTConsumerFactory::createASTConsumer(
}
std::vector<std::unique_ptr<ASTConsumer>> Consumers;
+
+ if (!Context.getOptions().SystemHeaders.value_or(false))
+ Consumers.push_back(std::make_unique<IgnoreSystemHeadersConsumer>());
+
if (!Checks.empty())
Consumers.push_back(Finder->newASTConsumer());
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index 41ff1c1016f25..48f4cd4823733 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -88,6 +88,10 @@ Improvements to clang-query
Improvements to clang-tidy
--------------------------
+- It no longer processes declarations from system headers by default, greatly
+ improving performance (up to 10x speed-up). This behavior is disabled if the
+ `SystemHeaders` option is enabled.
+
New checks
^^^^^^^^^^
diff --git a/clang-tools-extra/test/clang-tidy/checkers/readability/identifier-naming-anon-record-fields.cpp b/clang-tools-extra/test/clang-tidy/checkers/readability/identifier-naming-anon-record-fields.cpp
index 1b4d4e924a721..2604c88a30efb 100644
--- a/clang-tools-extra/test/clang-tidy/checkers/readability/identifier-naming-anon-record-fields.cpp
+++ b/clang-tools-extra/test/clang-tidy/checkers/readability/identifier-naming-anon-record-fields.cpp
@@ -33,24 +33,6 @@
// RUN: readability-identifier-naming.LocalConstantPointerPrefix: 'lc_', \
// RUN: }}'
-static union {
- int global;
-// CHECK-MESSAGES: :[[@LINE-1]]:7: warning: invalid case style for global variable 'global'
-// CHECK-FIXES: {{^}} int g_global;{{$}}
-
- const int global_const;
-// CHECK-MESSAGES: :[[@LINE-1]]:13: warning: invalid case style for global constant 'global_const'
-// CHECK-FIXES: {{^}} const int GLOBAL_CONST;{{$}}
-
- int *global_ptr;
-// CHECK-MESSAGES: :[[@LINE-1]]:8: warning: invalid case style for global pointer 'global_ptr'
-// CHECK-FIXES: {{^}} int *GlobalPtr_Ptr;{{$}}
-
- int *const global_const_ptr;
-// CHECK-MESSAGES: :[[@LINE-1]]:14: warning: invalid case style for global constant pointer 'global_const_ptr'
-// CHECK-FIXES: {{^}} int *const GLOBAL_CONST_PTR_Ptr;{{$}}
-};
-
namespace ns {
static union {
diff --git a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter.cpp b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter.cpp
index 448ef9ddf166c..a7956b4599b4f 100644
--- a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter.cpp
+++ b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter.cpp
@@ -66,19 +66,12 @@ class A { A(int); };
// CHECK4-NOT: warning:
// CHECK4-QUIET-NOT: warning:
-// CHECK: Suppressed 3 warnings (3 in non-user code)
// CHECK: Use -header-filter=.* to display errors from all non-system headers.
// CHECK-QUIET-NOT: Suppressed
-// CHECK2: Suppressed 1 warnings (1 in non-user code)
-// CHECK2: Use -header-filter=.* {{.*}}
// CHECK2-QUIET-NOT: Suppressed
-// CHECK3: Suppressed 2 warnings (2 in non-user code)
-// CHECK3: Use -header-filter=.* {{.*}}
// CHECK3-QUIET-NOT: Suppressed
// CHECK4-NOT: Suppressed {{.*}} warnings
-// CHECK4-NOT: Use -header-filter=.* {{.*}}
// CHECK4-QUIET-NOT: Suppressed
-// CHECK6: Suppressed 2 warnings (2 in non-user code)
// CHECK6: Use -header-filter=.* {{.*}}
int x = 123;
diff --git a/clang-tools-extra/test/clang-tidy/infrastructure/system-headers.cpp b/clang-tools-extra/test/clang-tidy/infrastructure/system-headers.cpp
index 9fa990b6aac8c..a25480e9aa39c 100644
--- a/clang-tools-extra/test/clang-tidy/infrastructure/system-headers.cpp
+++ b/clang-tools-extra/test/clang-tidy/infrastructure/system-headers.cpp
@@ -11,9 +11,9 @@
// RUN: clang-tidy -help | FileCheck -check-prefix=CHECK-OPT-PRESENT %s
// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='.*' -system-headers=true %s -- -isystem %S/Inputs/system-headers 2>&1 | FileCheck -check-prefix=CHECK-SYSTEM-HEADERS %s
-// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='.*' -system-headers=false %s -- -isystem %S/Inputs/system-headers 2>&1 | FileCheck -check-prefix=CHECK-NO-SYSTEM-HEADERS %s
+// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='.*' -system-headers=false %s -- -isystem %S/Inputs/system-headers 2>&1 | FileCheck -check-prefix=CHECK-NO-SYSTEM-HEADERS --allow-empty %s
// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='.*' -config='SystemHeaders: true' %s -- -isystem %S/Inputs/system-headers 2>&1 | FileCheck -check-prefix=CHECK-SYSTEM-HEADERS %s
-// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='.*' -config='SystemHeaders: false' %s -- -isystem %S/Inputs/system-headers 2>&1 | FileCheck -check-prefix=CHECK-NO-SYSTEM-HEADERS %s
+// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='.*' -config='SystemHeaders: false' %s -- -isystem %S/Inputs/system-headers 2>&1 | FileCheck -check-prefix=CHECK-NO-SYSTEM-HEADERS --allow-empty %s
#include <system_header.h>
// CHECK-SYSTEM-HEADERS: system_header.h:1:13: warning: single-argument constructors must be marked explicit
|
7961d8a
to
18a676b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be better to use decorator pattern instead of ad preprocess consumer.
Most of clang-tidy check does not reply on ast matcher result in system header. but it is maybe not suitable for static analyzer, especially for non-std system header.
Not sure I understand, can you explain in more detail what you mean by "decorator pattern"? What drawback do you see with the current implementation? |
Friendly ping. |
3357df5
to
2cc1690
Compare
@PiotrZSL @5chmidti @HerrCai0907 Do you have any objections to this patch? I would like to go ahead and land it now that all tests pass. The more we delay this, the higher the risk that new checks are introduced that no longer work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. but i am not sure whether ignoring header will hurt static analyzer
That's a good point. @haoNoQ @Xazax-hun @steakhal Do you see any implications on this patch when running clang static analyzer via clang-tidy? Any checks that would stop detecting issues? TLDR: we are setting the traversal scope to only contain top-level declarations which are not in system headers. |
Not analyzing system headers is the desired behavior for the Static Analyzer too. But please note that the |
Thank you for the quick answer! I will then proceed to rebase against latest trunk and merge :) We can look into applying the same change to the static analyzer in a separate patch. |
2cc1690
to
fef6bb4
Compare
The static analyzer handles this pretty well already. I haven't heard of any problems in this area. I think it makes sense to use the same logic by default in other tools unless you do have specific concerns. Also please take a note of the test case check-deserialization.cpp which demonstrates that the static analyzer not only skips the analysis of system headers - but it also skips deserialization of that code when PCHs/modules are involved. Which might be another source of performance gains for your patch. It might be a good idea to generalize this test case to clang-tidy.
I feel a sudden urge to channel my inner Mr. Heckles. It's probably true that most of the existing clang-tidy checks are unaffected by this change. That said, I suspect that your patch alters the underlying contract of clang-tidy checker API quite significantly. For example, with the current design, the checker is allowed to confirm the presence of a certain declaration in the included system headers before making other decisions. For example, a checker may refuse to emit a fix-it hint that suggests So I think your patch should be seen as more than a performance optimization; it's a somewhat significant change of contract between the checker developer and the tool's engine. I see a few conversations about that in the previous discussions of that patch too:
Do we have some kind of consensus on this change of direction? I'm personally on board with this either way but I'm not a clang-tidy maintainer. |
I do not like this approach. Would it be possible to add this feature to the existing
I agree with @haoNoQ, this could be a huge performance improvement. It is worth checking what happens in this scenario and whether we could avoid deserialisation if it is happening. That being said, I'd also prefer that to be part of the ASTMatchers library instead of a Clang Tidy local solution. Regarding the failing test:
Why is this happening? I think it would be worth diving deeper here. Is this a bug that we could fix?
That is a good point. I see a couple ways out of this:
I am not that concerned about this change, I think in most cases authors could find an alternative way to make this work. But I agree that we should call this out to the authors and provide some examples how to change their checks to accommodate this performance improvement. |
I agree with these statements. It would be awesome if we could generalize the scope reduction to only accept decls spelled in the main file. For instance, if a tool would want to only analyze a specific file, because that is the only file they have control over. W.r.t. ways forward, I highly discourage implicit slowdowns, for example if a wrong-doing check is enabled that accidentally does some rough AST traversal. It's too tempting already to do an AST traversal from the TU decl - breaking the PCH deserialization test in CSA. I've been there, done that. What I'm saying is that it may seem as an opportunistic performance optimization at first, but then once we get used to it, it will become a silent and serious slowdown regression under specific cases. |
Thank you all for the valuable input! There's a lot to process :) I will summarize the feedback, hopefully I got it right:
My intention is to take an iterative development approach, providing immediate value to users (who have been suffering this issue since a long time) in small increments, with possibility for easy revert if things don't work out well. I'm of the opinion that "perfect is the enemy of the good". For that reason, it's not the scope of this patch to enable further optimizations, nor to enable this for other tools (which would require further acquaintance, investigation, implementation, testing, review, and consensus from owners). All of that can be done in follow-up patches, and I'm happy to help out with that given support from tool owners. There's an RFC around this topic here. I believe there's consensus on skipping system headers, but the implementation is a bit more open for discussion. This is the only concrete implementation I've seen available, however. Regarding concrete open questions:
|
This does not sound controversial or hard. I am okay doing work that requires further consensus or a lot of investigation as a follow-up. But if that is not the case why not address it now? |
Those checks might be downstream. LLVM is usually OK breaking downstream users but we should still make some effort helping them out documenting the rationale and suggesting workarounds. |
Could you summarize what you tried and why it did not work? It is not just about fixing one check but potentially losing a capability. If we can no longer write certain kinds of checks in tidy that is a very big trade off. |
It is already controversial for clang-tidy, surely it will be even more controversial if we involve many other tools? Besides, this request involves a much larger scope (in terms of modified behaviors, not lines of code) than intended. The scope of this patch is only to improve clang-tidy. Everything can always be improved, but it does not need to happen in one single commit. It's good practice to keep commits small, focused on doing one single thing, especially if there is risk of revert. The risk of revert is a lot higher if N tools are modified than if only one tool is modified. It wouldn't be good to revert changes to clang-tidy just because this change breaks another unrelated tool. Like I said, I'm not against improving the other tools, simply that it should be done in follow-up patches, one step at a time. What would be the drawback of that?
Sure, makes sense. Do you have any concrete suggestion in mind? Personally I can't think of a way to enable individual checks to regain access to the "full analysis". Like I said above, that would be great, but I don't know how to achieve that in practice. If we want to protect downstream users, I suppose we can add a configuration option to revert back to the "full analysis" behavior. Would that lower the risk?
I diffed which declarations were picked up with "full analysis" and with this patch, on this example code. It appears that I searched for functionality to handle these special declarations in the |
True. But doing this in another component will not make the patch larger. So this is not an argument against. Moreover, the scope is not bigger than this patch. It would not change the behavior of other tools by default. Only introduce an option that other tools can adopt on their own accord. |
I prefer to start from this patch. |
I see. I think we should the very least investigate this. Open a ticket if you plan to land this change to make sure it is not forgotten, have a FIXME and so on. Manage it properly because this approach introduces technical debt. My problem is, statements like "way beyond my knowledge" makes me feel that you have no plans to address these in the future which makes me less OK to be on board with a change that introduces debt that needs to be cleaned up later.
E.g., in case of the
It would make sense if it was really incremental, but it is not. Currently, there is a consensus that it would be better to have this functionality in the ASTMatchers. Landing code only to throw it out is not incremental. Incremental is when the next patch builds on the previous one. What is described here is more iterative. I still fail to understand what is the push back from moving the code to the proper component. I heard no technical arguments, only vague claims that it would be more code or there would be more pushback from the owners. Could we at least try? If it is really more code and if there is really a push back we can always revert back to this approach. But what prevents us from at least trying?
If that is the case that is an even stronger argument to not land it but let it cook a bit more.
I don't understand this. Tidy is the most popular user of matchers. If this is not good enough for the matchers, it should not be good enough for tidy either. And again, what is the technical argument here? Are we just lazy trying to move this code to the ast matchers or is there any particular reason why tidy is better suited for "incubation"? Also, in the matcher library we would have an option. So turning it on only for tidy would give the same results. Am I missing something here? I think at the point we spent more time arguing here than it would have been to just move the code to the ASTMatcher library. Arguments like "we already have this code" are not technical in nature. |
and
If I'm not mistaken, currently the PR does not propose an alternative "full analysis" mode that we would silently switch to. So, this makes my concern void. What I wanted to highlight was that we should definitely not silently switch to doing something a lot more expensive. If we wanted to do something like that, we should have a clear way delivering log message or diagnostic to the user that we are falling back which will likely be a lot slower than it could be and this switch was caused by XYZ check. That said, it would be probably nice to mention in the docs that enabling the I hope I clarified the misunderstanding that my comment may have caused. Sorry about that. |
Currently, clang-tidy processes the entire TranslationUnit, including declarations in system headers. However, the work done in system headers is discarded at the very end when presenting results, unless the SystemHeaders option is active. This is a lot of wasted work, and makes clang-tidy very slow. In comparison, clangd only processes declarations in the main file, and it's claimed to be 10x faster than clang-tidy: https://github.com/lljbash/clangd-tidy To solve this problem, we can apply a similar solution done in clangd into clang-tidy. We do this by changing the traversal scope from the default TranslationUnitDecl, to only contain the top-level declarations that are _not_ part of system headers. We do this in the MatchASTConsumer class, so the logic can be reused by other tools. This behavior is currently off by default, and only clang-tidy enables skipping system headers. If wanted, this behavior can be activated by other tools in follow-up patches. I had to move MatchFinderOptions out of the MatchFinder class, because otherwise I could not set a default value for the "bool SkipSystemHeaders" member otherwise. The compiler error message was "default member initializer required before the end of its enclosing class". Note: this behavior is not active if the user requests warnings from system headers via the SystemHeaders option. Note2: out of all the unit tests, only one of them fails: readability/identifier-naming-anon-record-fields.cpp This is because the limited traversal scope no longer includes the "CXXRecordDecl" of the global anonymous union, see: llvm#130618 I have not found a way to make this work. For now, document the technical debt introduced. Note3: I have purposely decided to make this new feature enabled by default, instead of adding a new "opt-in/opt-out" flag. Having a new flag would mean duplicating all our tests to ensure they work in both modes, which would be infeasible. Having it enabled by default allow people to get the benefits immediately. Given that all unit tests pass, the risk for regressions is low. Even if that's the case, the only issue would be false negatives (fewer things are detected), which are much more tolerable than false positives. Credits: original implementation by @njames93, here: https://reviews.llvm.org/D150126 This implementation is simpler in the sense that it does not consider HeaderFilterRegex to filter even further. A follow-up patch could include the functionality if wanted. Fixes llvm#52959
4e72bdd
to
a4f57f4
Compare
I've rebased on latest main now and intend to land it as soon as the CI checks pass. Let me know if I should do any last-minute changes before that! Thank you all for the review :) |
This is huge. @carlosgalvezp thank you so much for all the hard work on the development and getting this merged. I love clang-tidy, but the build time has made it very difficult to use (and socialize) on large projects, especially for CICD. |
MatchFinderOptions moved in llvm/llvm-project#128150 Bug: 403519834 Change-Id: Ifa90934e90e9626fb3f946a0253efc190d79f19e Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6355808 Commit-Queue: Reid Kleckner <[email protected]> Auto-Submit: Hans Wennborg <[email protected]> Reviewed-by: Reid Kleckner <[email protected]> Cr-Commit-Position: refs/heads/main@{#1432805}
…8150) [clang-tidy] Avoid processing declarations in system headers Currently, clang-tidy processes the entire TranslationUnit, including declarations in system headers. However, the work done in system headers is discarded at the very end when presenting results, unless the SystemHeaders option is active. This is a lot of wasted work, and makes clang-tidy very slow. In comparison, clangd only processes declarations in the main file, and it's claimed to be 10x faster than clang-tidy: https://github.com/lljbash/clangd-tidy To solve this problem, we can apply a similar solution done in clangd into clang-tidy. We do this by changing the traversal scope from the default TranslationUnitDecl, to only contain the top-level declarations that are _not_ part of system headers. We do this in the MatchASTConsumer class, so the logic can be reused by other tools. This behavior is currently off by default, and only clang-tidy enables skipping system headers. If wanted, this behavior can be activated by other tools in follow-up patches. I had to move MatchFinderOptions out of the MatchFinder class, because otherwise I could not set a default value for the "bool SkipSystemHeaders" member otherwise. The compiler error message was "default member initializer required before the end of its enclosing class". Note: this behavior is not active if the user requests warnings from system headers via the SystemHeaders option. Note2: out of all the unit tests, only one of them fails: readability/identifier-naming-anon-record-fields.cpp This is because the limited traversal scope no longer includes the "CXXRecordDecl" of the global anonymous union, see: llvm#130618 I have not found a way to make this work. For now, document the technical debt introduced. Note3: I have purposely decided to make this new feature enabled by default, instead of adding a new "opt-in/opt-out" flag. Having a new flag would mean duplicating all our tests to ensure they work in both modes, which would be infeasible. Having it enabled by default allow people to get the benefits immediately. Given that all unit tests pass, the risk for regressions is low. Even if that's the case, the only issue would be false negatives (fewer things are detected), which are much more tolerable than false positives. Credits: original implementation by @njames93, here: https://reviews.llvm.org/D150126 This implementation is simpler in the sense that it does not consider HeaderFilterRegex to filter even further. A follow-up patch could include the functionality if wanted. Fixes llvm#52959 Co-authored-by: Carlos Gálvez <[email protected]>
Hi folks, we're seeing some regressions on our internal tidy checks after this patch. The regression is similar to what @haoNoQ raised in #128150 (comment), the blast radius much wider than just checks that build an internal database though.
I wanted to make sure we're still OK with this change (looking at the thread, that sounds like the case, but I wanted to emphasize that the blast radius is much wider than checks analyzing declarations inside system headers. it's also affecting checks that analyzes declarations that refer to system headers). ATM, this behavior is coupled with emitting findings for system headers. Can we at least introduce a new option to control this new behavior? |
This is unfortunate. I think |
I think the naive expectation is that these would be able to reach nodes in system headers, so I agree that it would be good to see if we can improve the behavior here. CC @carlosgalvezp @PiotrZSL @HerrCai0907 |
Thanks for the info, sorry this created trouble for you! An option to turn this behavior off should be easy to add. It's however problematic if it lasts for more than one release because it's harder to deprecate later. It would be good to have a unit test of this problem, would the one in the CERT check mentioned above be sufficient? |
I think it's pretty reasonable. |
So I looked into the problem, and it seems that issue is that the ParentMapContext::ParentMap::ParentMap(ASTContext &Ctx) {
+ std::vector<Decl *> const& OldScope = Ctx.getTraversalScope2();
+ Ctx.setTraversalScope({Ctx.getTranslationUnitDecl()});
ASTVisitor(*this).TraverseAST(Ctx);
+ Ctx.setTraversalScope(OldScope);
} Essentially restore the full traversal scope for the sake of fetching all the parents, and then restoring it. This pattern is not uncommon, I have seen similar usages with void visitMatch(const BoundNodes& BoundNodesView) override {
TraversalKindScope RAII(*Context, Callback->getCheckTraversalKind());
CurBoundScope RAII2(State, BoundNodesView);
Callback->run(MatchFinder::MatchResult(BoundNodesView, Context));
} So I could create similar Let me know if this is an acceptable solution. Otherwise I'm happy to add an escape hatch. I'm a bit hesitant to adding it to the config file, would a compile-time flag be preferable (similar to If none of these options are satisfactory I can of course revert and go back to the drawing board :) |
Building a full parent map makes sense to me, but I have mixed feelings about modifying the |
Btw I found this commit which seemed to be intended at setting the basis for running clang-tidy on a reduced AST, and where the @sam-mccall Perhaps you have good ideas on how this problem can be solved best? |
I believe I found something promising:
bool MatchASTVisitor::TraverseDecl(Decl *DeclNode) {
if (!DeclNode) {
return true;
}
+ if (Options.SkipSystemHeaders && isInSystemHeader(DeclNode)) {
+ return true;
+ } Pros:
Cons:
What do you think @AaronBallman @Xazax-hun , does this sound reasonable? If so I can put up a PR right away. |
Hmm, it does not remove much
|
I'd prefer to go ahead with the proposal and continue the discussion once that PR landed. This would unbreak the checks/resolve the backward compatibility issues while providing some of the benefits (even if the perf gain is smaller). At least we will be in a state that does not break anything for anyone. And we can iterate from there. |
Ok! Will fix sometime today. I think this topic needs some more careful iteration and we should unblock. Worth noting is that even though a lot of warnings still remain the performance gain was still significant, i.e. going from 10 seconds (baseline) to 3 seconds (proposal). Trunk takes 1 second. |
…atchFinder This was introduced here with the purpose of speed up clang-tidy, making it not process system headers: llvm@e4a8969 However it was later realized that this change is too aggressive: the reduced traversal scope also impacts the ParentMapContext, which in turn means that matchers like "hasParent" or "hasAncestor" are not able to find parents in system headers, even if the declaration at hand is in a user header. This causes regressions for downstream users writing custom clang-tidy checks: llvm#128150 (comment) This patch fixes this problem, as follows: - Revert the changes to the clang-tidy unit tests. - Revert setting the TraversalScope in MatchASTConsumer. - Move the logic to MatchASTVisitor::TraverseDecl instead. Pros: - Leaves the ASTContext and ParentMapContext untouched, so hasParent and similar matchers should work again. - Keeps avoiding matching and checking decls outside of system headers. - The broken unit test in readability now works, because we are no longer processing only TopLevelDecls. - The changes to the CERT test can be reverted. - Most likely we don't lose any functionality or introduce false negatives. Cons: - We still have to traverse decls in system headers. I did try to return false; instead of return true; in the proposed code but that made around 60 clang-tidy tests to fail. - We still process many nodes (not Decls) that are in system headers. As a benchmark, I tried running all clang-tidy checks on a .cpp file that includes all the standard C++ headers. This leads to: * Baseline (without any optimizations). Suppressed 196093 warnings (196093 in non-user code) 10 seconds to run. * Trunk (aggressive optimizations). Suppressed 8050 warnings (8050 in non-user code). 1 second to run. * This patch (conservative optimizations). Suppressed 141779 warnings (141779 in non-user code). 3 seconds to run. This patch thus gives some performance improvement while keeping backwards compatibility. There's still room for improvement which shall be tackled in a follow-up patch after unbreaking clang-tidy. Fixes llvm#130618
…lvm#128150)" This was too aggressive, and leads to problems for downstream users: llvm#128150 (comment) Let's revert and reland it once we have addressed the problems. This reverts commit e4a8969.
#132743) …(#128150)" This was too aggressive, and leads to problems for downstream users: #128150 (comment) Let's revert and reland it once we have addressed the problems. This reverts commit e4a8969. Co-authored-by: Carlos Gálvez <[email protected]>
…em headers … (#132743) …(#128150)" This was too aggressive, and leads to problems for downstream users: llvm/llvm-project#128150 (comment) Let's revert and reland it once we have addressed the problems. This reverts commit e4a8969. Co-authored-by: Carlos Gálvez <[email protected]>
This reverts commit b8833a8. Reason for revert: Upstream change was reverted, see https://crbug.com/406271686 Original change's description: > Update FindBadRawPtrPatterns.cpp for Clang API change > > MatchFinderOptions moved in llvm/llvm-project#128150 > > Bug: 403519834 > Change-Id: Ifa90934e90e9626fb3f946a0253efc190d79f19e > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6355808 > Commit-Queue: Reid Kleckner <[email protected]> > Auto-Submit: Hans Wennborg <[email protected]> > Reviewed-by: Reid Kleckner <[email protected]> > Cr-Commit-Position: refs/heads/main@{#1432805} Bug: 403519834,406271686 Change-Id: If72b2520db4ac60e65290c81dc69b77b9b65acdc Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6395695 Auto-Submit: Nico Weber <[email protected]> Reviewed-by: Hans Wennborg <[email protected]> Commit-Queue: Hans Wennborg <[email protected]> Cr-Commit-Position: refs/heads/main@{#1437975}
[clang-tidy] Avoid processing declarations in system headers
Currently, clang-tidy processes the entire TranslationUnit, including
declarations in system headers. However, the work done in system
headers is discarded at the very end when presenting results, unless
the SystemHeaders option is active.
This is a lot of wasted work, and makes clang-tidy very slow.
In comparison, clangd only processes declarations in the main file,
and it's claimed to be 10x faster than clang-tidy:
https://github.com/lljbash/clangd-tidy
To solve this problem, we can apply a similar solution done in clangd
into clang-tidy. We do this by changing the traversal scope from the
default TranslationUnitDecl, to only contain the top-level declarations
that are not part of system headers. We do this in the
MatchASTConsumer class, so the logic can be reused by other tools.
This behavior is currently off by default, and only clang-tidy
enables skipping system headers. If wanted, this behavior can be
activated by other tools in follow-up patches.
I had to move MatchFinderOptions out of the MatchFinder class,
because otherwise I could not set a default value for the
"bool SkipSystemHeaders" member otherwise. The compiler error message
was "default member initializer required before the end of its
enclosing class".
Note: this behavior is not active if the user requests warnings from
system headers via the SystemHeaders option.
Note2: out of all the unit tests, only one of them fails:
readability/identifier-naming-anon-record-fields.cpp
This is because the limited traversal scope no longer includes the
"CXXRecordDecl" of the global anonymous union, see:
#130618
I have not found a way to make this work. For now, document the
technical debt introduced.
Note3: I have purposely decided to make this new feature enabled by
default, instead of adding a new "opt-in/opt-out" flag. Having a new
flag would mean duplicating all our tests to ensure they work in both
modes, which would be infeasible. Having it enabled by default allow
people to get the benefits immediately. Given that all unit tests pass,
the risk for regressions is low. Even if that's the case, the only
issue would be false negatives (fewer things are detected), which
are much more tolerable than false positives.
Credits: original implementation by @njames93, here:
https://reviews.llvm.org/D150126
This implementation is simpler in the sense that it does not consider
HeaderFilterRegex to filter even further. A follow-up patch could
include the functionality if wanted.
Fixes #52959