Skip to content

[NFCI] Avoid adding duplicated SpecialCaseList::Sections. #140821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

qinkunbao
Copy link
Member

#140127 converts SpecialCaseList::Sections
from StringMap to vector. However, the previous StringMap ensures that only a new
section is created when the SectionStr is different. We should keep the same behavior.

Created using spr 1.3.6
@llvmbot
Copy link
Member

llvmbot commented May 20, 2025

@llvm/pr-subscribers-llvm-support

Author: Qinkun Bao (qinkunbao)

Changes

#140127 converts SpecialCaseList::Sections
from StringMap to vector. However, the previous StringMap ensures that only a new
section is created when the SectionStr is different. We should keep the same behavior.


Full diff: https://github.com/llvm/llvm-project/pull/140821.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Support/SpecialCaseList.h (+1)
  • (modified) llvm/lib/Support/SpecialCaseList.cpp (+13-6)
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index fc6dc93651f38..baa5c917220e3 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -138,6 +138,7 @@ class SpecialCaseList {
     std::unique_ptr<Matcher> SectionMatcher;
     SectionEntries Entries;
     std::string SectionStr;
+    unsigned LineNo;
   };
 
   std::vector<Section> Sections;
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 5145cccc91e3b..e8dac4680f96f 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -137,18 +137,25 @@ bool SpecialCaseList::createInternal(const MemoryBuffer *MB,
 Expected<SpecialCaseList::Section *>
 SpecialCaseList::addSection(StringRef SectionStr, unsigned LineNo,
                             bool UseGlobs) {
-  Sections.emplace_back();
-  auto &Section = Sections.back();
-  Section.SectionStr = SectionStr;
-
-  if (auto Err = Section.SectionMatcher->insert(SectionStr, LineNo, UseGlobs)) {
+  auto it =
+      std::find_if(Sections.begin(), Sections.end(), [&](const Section &s) {
+        return s.SectionStr == SectionStr && s.LineNo == LineNo;
+      });
+  if (it == Sections.end()) {
+    Sections.emplace_back();
+    auto &sec = Sections.back();
+    sec.SectionStr = SectionStr;
+    sec.LineNo = LineNo;
+    it = std::prev(Sections.end());
+  }
+  if (auto Err = it->SectionMatcher->insert(SectionStr, LineNo, UseGlobs)) {
     return createStringError(errc::invalid_argument,
                              "malformed section at line " + Twine(LineNo) +
                                  ": '" + SectionStr +
                                  "': " + toString(std::move(Err)));
   }
 
-  return &Section;
+  return &(*it);
 }
 
 bool SpecialCaseList::parse(const MemoryBuffer *MB, std::string &Error) {

Created using spr 1.3.6
@qinkunbao qinkunbao requested a review from vitalybuka May 21, 2025 00:29
Created using spr 1.3.6
@vitalybuka
Copy link
Collaborator

But why?
We don't use uniqueness any how. We will need to O(N_globs) lookups either way.
I propose to keep all sections as-is, in original order (or reversed), so we can interate them in reverse (or straight) order up to the first match.

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-request review with reply, if it's still needed.

@qinkunbao
Copy link
Member Author

Hi Vitaly,

Sorry for the late reply. I am thinking about a good solution for #139772 in the past two days.

At the moment, I am thinking only the order of Globs and RegExes (Or Pattern) matters. The order of Section, Prefix and Category does not matter.

Without this PR, considering the following example.

[sec1]
src:a.txt
src:b.txt
[sec1]
src:b.txt
[sec1]
src:b.txt

We have to iterate the all the sections (all entry) to know if a.txt should be matched or not.

This pull request (or by reverting https://github.com/llvm/llvm-project/pull/140127) simplifies the process to finding the last matching Pattern for a given Prefix and Category by jumping to sec1 and performing a reverse walk. Each matched pattern provides a line number and a file number. Entries with higher line numbers take precedence over those with lower line numbers. I'm interested in your feedback and we can discuss this further in our meeting.

@qinkunbao qinkunbao requested a review from vitalybuka May 22, 2025 18:23
@vitalybuka
Copy link
Collaborator

Hi Vitaly,

Sorry for the late reply. I am thinking about a good solution for #139772 in the past two days.

At the moment, I am thinking only the order of Globs and RegExes (Or Pattern) matters. The order of Section, Prefix and Category does not matter.

Without this PR, considering the following example.

[sec1]
src:a.txt
src:b.txt
[sec1]
src:b.txt
[sec1]
src:b.txt

We have to iterate the all the sections (all entry) to know if a.txt should be matched or not.

This pull request (or by reverting https://github.com/llvm/llvm-project/pull/140127) simplifies the process to finding the last matching Pattern for a given Prefix and Category by jumping to sec1 and performing a reverse walk. Each matched pattern provides a line number and a file number. Entries with higher line numbers take precedence over those with lower line numbers. I'm interested in your feedback and we can discuss this further in our meeting.

Not sure I understand the issue.

  1. Make Glob a vector added in parsing order
  2. Make sections a vector added in parsing order

Scan all in reverse order - done

duplicate entries is not a problem as they should not be common

@qinkunbao
Copy link
Member Author

qinkunbao commented May 22, 2025

Make Glob a vector added in parsing order

Yeah, that is needed.

Make sections a vector added in parsing order

It is not necessary.

duplicate entries is not a problem as they should not be common

Yes, it is not common but we need to iterate all the sections every time to ensure the correctness.

[sec1]
src:a.txt
src:b.txt
[sec1]
src:b.txt
[sec1]
src:b.txt

Suppose we have the function query inSectionBlame(Section="sec1", Prefix="src", Query="a.txt", we need to iterate all sections to find the entry and get the correct line number 2.

@qinkunbao
Copy link
Member Author

qinkunbao commented May 22, 2025

Discussed with @vitalybuka offline. It turns out that Section name can be a regular expression so the order of Section needs to be tracked (with a vector).

@qinkunbao qinkunbao closed this May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants