[NFCI] Avoid adding duplicated SpecialCaseList::Sections. #140821

qinkunbao · 2025-05-20T23:53:01Z

#140127 converts SpecialCaseList::Sections
from StringMap to vector. However, the previous StringMap ensures that only a new
section is created when the SectionStr is different. We should keep the same behavior.

Created using spr 1.3.6

llvmbot · 2025-05-20T23:53:37Z

@llvm/pr-subscribers-llvm-support

Author: Qinkun Bao (qinkunbao)

Changes

#140127 converts SpecialCaseList::Sections
from StringMap to vector. However, the previous StringMap ensures that only a new
section is created when the SectionStr is different. We should keep the same behavior.

Full diff: https://github.com/llvm/llvm-project/pull/140821.diff

2 Files Affected:

(modified) llvm/include/llvm/Support/SpecialCaseList.h (+1)
(modified) llvm/lib/Support/SpecialCaseList.cpp (+13-6)

diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index fc6dc93651f38..baa5c917220e3 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -138,6 +138,7 @@ class SpecialCaseList {
     std::unique_ptr<Matcher> SectionMatcher;
     SectionEntries Entries;
     std::string SectionStr;
+    unsigned LineNo;
   };
 
   std::vector<Section> Sections;
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 5145cccc91e3b..e8dac4680f96f 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -137,18 +137,25 @@ bool SpecialCaseList::createInternal(const MemoryBuffer *MB,
 Expected<SpecialCaseList::Section *>
 SpecialCaseList::addSection(StringRef SectionStr, unsigned LineNo,
                             bool UseGlobs) {
-  Sections.emplace_back();
-  auto &Section = Sections.back();
-  Section.SectionStr = SectionStr;
-
-  if (auto Err = Section.SectionMatcher->insert(SectionStr, LineNo, UseGlobs)) {
+  auto it =
+      std::find_if(Sections.begin(), Sections.end(), [&](const Section &s) {
+        return s.SectionStr == SectionStr && s.LineNo == LineNo;
+      });
+  if (it == Sections.end()) {
+    Sections.emplace_back();
+    auto &sec = Sections.back();
+    sec.SectionStr = SectionStr;
+    sec.LineNo = LineNo;
+    it = std::prev(Sections.end());
+  }
+  if (auto Err = it->SectionMatcher->insert(SectionStr, LineNo, UseGlobs)) {
     return createStringError(errc::invalid_argument,
                              "malformed section at line " + Twine(LineNo) +
                                  ": '" + SectionStr +
                                  "': " + toString(std::move(Err)));
   }
 
-  return &Section;
+  return &(*it);
 }
 
 bool SpecialCaseList::parse(const MemoryBuffer *MB, std::string &Error) {

Created using spr 1.3.6

vitalybuka · 2025-05-21T06:59:42Z

But why?
We don't use uniqueness any how. We will need to O(N_globs) lookups either way.
I propose to keep all sections as-is, in original order (or reversed), so we can interate them in reverse (or straight) order up to the first match.

vitalybuka

Please re-request review with reply, if it's still needed.

qinkunbao · 2025-05-22T18:23:12Z

Hi Vitaly,

Sorry for the late reply. I am thinking about a good solution for #139772 in the past two days.

At the moment, I am thinking only the order of Globs and RegExes (Or Pattern) matters. The order of Section, Prefix and Category does not matter.

Without this PR, considering the following example.

[sec1]
src:a.txt
src:b.txt
[sec1]
src:b.txt
[sec1]
src:b.txt

We have to iterate the all the sections (all entry) to know if a.txt should be matched or not.

This pull request (or by reverting https://github.com/llvm/llvm-project/pull/140127) simplifies the process to finding the last matching Pattern for a given Prefix and Category by jumping to sec1 and performing a reverse walk. Each matched pattern provides a line number and a file number. Entries with higher line numbers take precedence over those with lower line numbers. I'm interested in your feedback and we can discuss this further in our meeting.

vitalybuka · 2025-05-22T18:29:16Z

Hi Vitaly,

Sorry for the late reply. I am thinking about a good solution for #139772 in the past two days.

At the moment, I am thinking only the order of Globs and RegExes (Or Pattern) matters. The order of Section, Prefix and Category does not matter.

Without this PR, considering the following example.
[sec1]
src:a.txt
src:b.txt
[sec1]
src:b.txt
[sec1]
src:b.txt
We have to iterate the all the sections (all entry) to know if a.txt should be matched or not.

This pull request (or by reverting https://github.com/llvm/llvm-project/pull/140127) simplifies the process to finding the last matching Pattern for a given Prefix and Category by jumping to sec1 and performing a reverse walk. Each matched pattern provides a line number and a file number. Entries with higher line numbers take precedence over those with lower line numbers. I'm interested in your feedback and we can discuss this further in our meeting.

Not sure I understand the issue.

Make Glob a vector added in parsing order
Make sections a vector added in parsing order

Scan all in reverse order - done

duplicate entries is not a problem as they should not be common

qinkunbao · 2025-05-22T18:59:11Z

Make Glob a vector added in parsing order

Yeah, that is needed.

Make sections a vector added in parsing order

It is not necessary.

duplicate entries is not a problem as they should not be common

Yes, it is not common but we need to iterate all the sections every time to ensure the correctness.

[sec1]
src:a.txt
src:b.txt
[sec1]
src:b.txt
[sec1]
src:b.txt

Suppose we have the function query inSectionBlame(Section="sec1", Prefix="src", Query="a.txt", we need to iterate all sections to find the entry and get the correct line number 2.

qinkunbao · 2025-05-22T21:01:56Z

Discussed with @vitalybuka offline. It turns out that Section name can be a regular expression so the order of Section needs to be tracked (with a vector).

[𝘀𝗽𝗿] initial version

21352cb

Created using spr 1.3.6

llvmbot added the llvm:support label May 20, 2025

Remove lineno.

bb0e2f0

Created using spr 1.3.6

qinkunbao requested a review from vitalybuka May 21, 2025 00:29

Add unit tests.

6c30ad8

Created using spr 1.3.6

vitalybuka requested changes May 22, 2025

View reviewed changes

qinkunbao requested a review from vitalybuka May 22, 2025 18:23

qinkunbao closed this May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NFCI] Avoid adding duplicated SpecialCaseList::Sections. #140821

[NFCI] Avoid adding duplicated SpecialCaseList::Sections. #140821

Uh oh!

qinkunbao commented May 20, 2025

Uh oh!

llvmbot commented May 20, 2025

Uh oh!

vitalybuka commented May 21, 2025

Uh oh!

vitalybuka left a comment

Uh oh!

qinkunbao commented May 22, 2025

Uh oh!

vitalybuka commented May 22, 2025

Uh oh!

qinkunbao commented May 22, 2025 •

edited

Loading

Uh oh!

qinkunbao commented May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

[NFCI] Avoid adding duplicated SpecialCaseList::Sections. #140821

[NFCI] Avoid adding duplicated SpecialCaseList::Sections. #140821

Uh oh!

Conversation

qinkunbao commented May 20, 2025

Uh oh!

llvmbot commented May 20, 2025

Uh oh!

vitalybuka commented May 21, 2025

Uh oh!

vitalybuka left a comment

Choose a reason for hiding this comment

Uh oh!

qinkunbao commented May 22, 2025

Uh oh!

vitalybuka commented May 22, 2025

Uh oh!

qinkunbao commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinkunbao commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

qinkunbao commented May 22, 2025 •

edited

Loading

qinkunbao commented May 22, 2025 •

edited

Loading