Skip to content

Implement src:*=sanitize for UBSan #139772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

qinkunbao
Copy link
Member

@qinkunbao qinkunbao commented May 13, 2025

It is a draft implementation for "src:*=sanitize". It should be applied to all sanitizers.

Any srcs assigned to the sanitize category will have their sanitizer instrumentation remained ignored by "src:". For example,

src:*
src:*/test1.c=sanitize

test1.c will still have the UBSan instrumented.

However

type:int
src:*/test1.c=sanitize

test1.c does not have the int type check.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 13, 2025
@llvmbot
Copy link
Member

llvmbot commented May 13, 2025

@llvm/pr-subscribers-clang

Author: Qinkun Bao (qinkunbao)

Changes

It is a draft implementation for "src:*=sanitize". It should be applied to all sanitizers.

Any srcs assigned to the sanitize category will have their sanitizer instrumentation remained ignored by "src:". For example,

src:*
src:*/test1.c=sanitize

test1.c will still have the UBSan instrumented.

However

type:int
src:*/test1.c=sanitize

test1.c does not have the int type check.


Full diff: https://github.com/llvm/llvm-project/pull/139772.diff

2 Files Affected:

  • (modified) clang/lib/Basic/NoSanitizeList.cpp (+2-1)
  • (added) clang/test/CodeGen/ubsan-src-ignorelist-category.test (+23)
diff --git a/clang/lib/Basic/NoSanitizeList.cpp b/clang/lib/Basic/NoSanitizeList.cpp
index e7e63c1f419e6..c4db5a5a211d2 100644
--- a/clang/lib/Basic/NoSanitizeList.cpp
+++ b/clang/lib/Basic/NoSanitizeList.cpp
@@ -44,7 +44,8 @@ bool NoSanitizeList::containsFunction(SanitizerMask Mask,
 
 bool NoSanitizeList::containsFile(SanitizerMask Mask, StringRef FileName,
                                   StringRef Category) const {
-  return SSCL->inSection(Mask, "src", FileName, Category);
+  bool allowList = SSCL->inSection(Mask, "src", FileName, "sanitize");
+  return SSCL->inSection(Mask, "src", FileName, Category) && !allowList;
 }
 
 bool NoSanitizeList::containsMainFile(SanitizerMask Mask, StringRef FileName,
diff --git a/clang/test/CodeGen/ubsan-src-ignorelist-category.test b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
new file mode 100644
index 0000000000000..5242c10bdeec7
--- /dev/null
+++ b/clang/test/CodeGen/ubsan-src-ignorelist-category.test
@@ -0,0 +1,23 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test1.c -o - | FileCheck %s -check-prefix=CHECK-ALLOWLIST
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow -fsanitize-ignorelist=%t/src.ignorelist -emit-llvm %t/test2.c -o - | FileCheck %s -check-prefix=CHECK-IGNORELIST
+
+
+// Verify ubsan only emits checks for files in the allowlist
+
+//--- src.ignorelist
+src:*
+src:*/test1.c=sanitize
+
+//--- test1.c
+int add1(int a, int b) {
+// CHECK-ALLOWLIST: llvm.sadd.with.overflow.i32
+    return a+b;
+}
+
+//--- test2.c
+int add2(int a, int b) {
+// CHECK-IGNORELIST-NOT: llvm.sadd.with.overflow.i32
+    return a+b;
+}

@qinkunbao qinkunbao requested a review from vitalybuka May 13, 2025 18:09
@vitalybuka
Copy link
Collaborator

Reminder to @thurstond as well :)

Please figure out https://llvm.org/docs/GitHub.html#stacked-pull-requests

// Verify ubsan only emits checks for files in the allowlist

//--- src.ignorelist
src:*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have a problem with:

src:*
src:*/mylib/*=sanitize
src:*/mylib/test.cc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have some ideas how to fix this:

  1. Right now order does not matter, if we say that it's matter now, we don't break existing stuff
  2. Let's make =sanitize reset all above and =
  3. So inSection need to return linenum in file which will compare with linenum of =sanitize. There is a version of function for that

Problem: SanitizerSpecialCaseList tracks linenum, but if you check parsing code, it can represent multiple files!

I suggest to start with updating SpecialCaseList to track (fileindex, linenum).

But, SpecialCaseList::Sections for whatever reasons is StringMap,
so it's ordered by name, and it's not used anyway.

Let's first PR to make SpecialCaseList::Sections -> vector

similar SanitizerSpecialCaseList?

Copy link
Member Author

@qinkunbao qinkunbao May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have a problem with:

src:*
src:*/mylib/*=sanitize
src:*/mylib/test.cc

For this example, I think /mylib/test.cc will be instrumented because the file /mylib/test.cc will fall into two categories (= and =sanitize) with this PR. And it should follow the =sanitizer category. I think it should match the current doc. (I have updated the test to cover the situation. )

I was a little hesitated about the change during implementing the feature. For example, if we have a ignore list file

type:int
src:test.cc=sanitize

If a file test.cc has the type int, shall we still instrument the type int variable in the file test.cc?

On the other hand,

src:test.cc
type:int=sanitize

Shall we still instrument the type int variable in the file test.cc?

I think the both answers are yes. Let me know if it makes sense to you.

I will create a new PR with using user branches in llvm/llvm-project to update SpecialCaseList::Sections. Thank you for the feedback.

Copy link
Member Author

@qinkunbao qinkunbao May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, SpecialCaseList::Sections for whatever reasons is StringMap,
so it's ordered by name, and it's not used anyway.

Oh, I find the key of the StringMap Sections is actually used. Here is the Link. Here are some options. Let me know which option is better.

  1. How about creating a vector<Section*> to track the order?
StringMap<Section> Sections;
vector<Section*> SectionsVec;

  1. We can use the key from Globs
    auto [It, DidEmplace] = Globs.try_emplace(Pattern);

qinkunbao added a commit that referenced this pull request May 16, 2025
… vector.

As discussed in #139772, SpecialCaseList::Sections can keep the order of Sections when parsing the case list.

Reviewers: thurstond, vitalybuka

Reviewed By: vitalybuka

Pull Request: #140127
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 16, 2025
…tringMap to vector.

As discussed in llvm/llvm-project#139772, SpecialCaseList::Sections can keep the order of Sections when parsing the case list.

Reviewers: thurstond, vitalybuka

Reviewed By: vitalybuka

Pull Request: llvm/llvm-project#140127
qinkunbao added a commit that referenced this pull request May 24, 2025
…140964)

As discussed in #139772 and
#140529, Matcher::Globs can
keep the order when parsing the case list.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 24, 2025
… vector. (#140964)

As discussed in llvm/llvm-project#139772 and
llvm/llvm-project#140529, Matcher::Globs can
keep the order when parsing the case list.
@qinkunbao
Copy link
Member Author

Close this PR as the implementation goes to other PRs.

@qinkunbao qinkunbao closed this May 28, 2025
@vitalybuka vitalybuka reopened this May 28, 2025
@vitalybuka vitalybuka closed this May 28, 2025
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
…lvm#140964)

As discussed in llvm#139772 and
llvm#140529, Matcher::Globs can
keep the order when parsing the case list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants