Skip to content

Don't pass null pointers to memcmp and memcpy in libFuzzer #96775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

davidben
Copy link
Contributor

@davidben davidben commented Jun 26, 2024

In C, it is UB to call memcmp(NULL, NULL, 0), memcpy(NULL, NULL, 0), etc. Unfortunately, (NULL, 0) is the natural representation of an empty sequence of objects and extremely common in real world code. As a result, all C code, and C++ code which calls into C functions, must carefully guard all calls to memcpy.

This is a serious, real world usability issue in C and should be fixed in the language (see #49459). In the meantime, pay the cost of the extra branch to avoid tripping UBSan in libFuzzer. Once the usability problem in C has been fixed, these checks can be removed.

Fixes #96772

@llvmbot
Copy link
Member

llvmbot commented Jun 26, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: David Benjamin (davidben)

Changes

Due to a C language bug, it is UB to call memcmp(NULL, NULL, 0), memcpy(NULL, NULL, 0), etc. This is a problem because (NULL, 0) is the natural representation of the empty slice and extremely common in real world code. As a result, all C code, and C++ code which calls into C functions, must carefully guard all calls to memcpy.

This is horrible and should be fixed in C (see #49459), but in the meantime, avoid tripping UBSan in libFuzzer.

Fixes #96772


Full diff: https://github.com/llvm/llvm-project/pull/96775.diff

2 Files Affected:

  • (modified) compiler-rt/lib/fuzzer/FuzzerDictionary.h (+3-1)
  • (modified) compiler-rt/lib/fuzzer/FuzzerLoop.cpp (+6-1)
diff --git a/compiler-rt/lib/fuzzer/FuzzerDictionary.h b/compiler-rt/lib/fuzzer/FuzzerDictionary.h
index 48f063c7ee4e6..b447f3de6a591 100644
--- a/compiler-rt/lib/fuzzer/FuzzerDictionary.h
+++ b/compiler-rt/lib/fuzzer/FuzzerDictionary.h
@@ -29,7 +29,9 @@ template <size_t kMaxSizeT> class FixedWord {
     static_assert(kMaxSizeT <= std::numeric_limits<uint8_t>::max(),
                   "FixedWord::kMaxSizeT cannot fit in a uint8_t.");
     assert(S <= kMaxSize);
-    memcpy(Data, B, S);
+    // memcpy cannot be passed empty slices when the pointers are null.
+    if (S)
+      memcpy(Data, B, S);
     Size = static_cast<uint8_t>(S);
   }
 
diff --git a/compiler-rt/lib/fuzzer/FuzzerLoop.cpp b/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
index 935dd2342e18f..8e229bd0ef322 100644
--- a/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
+++ b/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
@@ -579,6 +579,9 @@ void Fuzzer::CrashOnOverwrittenData() {
 // Compare two arrays, but not all bytes if the arrays are large.
 static bool LooseMemeq(const uint8_t *A, const uint8_t *B, size_t Size) {
   const size_t Limit = 64;
+  // memcmp cannot be passed empty slices when the pointers are null.
+  if (!Size)
+    return 1;
   if (Size <= 64)
     return !memcmp(A, B, Size);
   // Compare first and last Limit/2 bytes.
@@ -596,7 +599,9 @@ ATTRIBUTE_NOINLINE bool Fuzzer::ExecuteCallback(const uint8_t *Data,
   // We copy the contents of Unit into a separate heap buffer
   // so that we reliably find buffer overflows in it.
   uint8_t *DataCopy = new uint8_t[Size];
-  memcpy(DataCopy, Data, Size);
+  // memcpy cannot be passed empty slices when the pointers are null.
+  if (Size)
+    memcpy(DataCopy, Data, Size);
   if (EF->__msan_unpoison)
     EF->__msan_unpoison(DataCopy, Size);
   if (EF->__msan_unpoison_param)

@davidben
Copy link
Contributor Author

davidben commented Jun 26, 2024

An alternate fix would be to give up on the libc functions and instead use std::copy_n and std::equal. Those do not share the same flaw. I stuck with the libc functions because I didn't see evidence of libFuzzer using much of <algorithm>'s functions here, but happy to do that if preferred. (I usually fix C++ code by switching to the STL.)

Copy link
Contributor

@dmcardle dmcardle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert here, but these changes LGTM.

I was a little suspicious of the memcpy in MutateAndTestOne(), but it turns out that ChooseUnitToMutate() asserts that the returned unit is not empty.

@davidben
Copy link
Contributor Author

LLVM folks: Friendly ping. Is there anything else needed to land this fix for undefined behavior in libFuzzer?

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subject line can be named [fuzzer] xxx or [Fuzzer] xxx.

"work around a C language bug" is not a good way to describe this usability issue. The function signature is unfortunate and should be fixed, but I am not sure calling it a language bug is useful.

@davidben
Copy link
Contributor Author

"work around a C language bug" is not a good way to describe this usability issue. The function signature is unfortunate and should be fixed, but I am not sure calling it a language bug is useful.

I mean, the non-null precondition does actually come from C itself, so that is where it'd need to be fixed. I think it's also demonstrably a problem on the spec side, seeing as it's inconsistent with how anyone anywhere writes C code. (Chrome for Linux currently cannot even compile with the relevant UBSan bit turned on because some tool used during the build computes (T*)NULL + 0.) But, fair enough, I will reword the commit message. :)

In C, it is UB to call memcmp(NULL, NULL, 0), memcpy(NULL, NULL, 0),
etc. Unfortunately, (NULL, 0) is the natural representation of an empty
sequence of objects and extremely common in real world code. As a
result, all C code, and C++ code which calls into C functions, must
carefully guard all calls to memcpy.

This is a serious, real world usability issue in C and should be fixed
in the language (see llvm#49459). In the meantime, pay the cost of the extra
branch to avoid tripping UBSan in libFuzzer. Once the usability problem
in C has been fixed, these checks can be removed.

Fixes llvm#96772
@davidben davidben force-pushed the fuzzer-c-workaround branch from 7df3637 to 6ddda79 Compare July 16, 2024 16:00
@davidben davidben changed the title Work around a C language bug in libFuzzer Don't pass null pointers to memcmp and memcpy in libFuzzer Jul 16, 2024
@davidben
Copy link
Contributor Author

Comment addressed and commit message / PR description updated.

@davidben
Copy link
Contributor Author

Anything else left to do here? (I don't have commit privileges, so someone else would need to merge this for me.)

@vitalybuka
Copy link
Collaborator

It says you are a "member". I believe you can press "squash and merge" above.
If this is not enough, please ping me in chat and I merge the patch.

@davidben
Copy link
Contributor Author

Like I said above, I do not have access to that. I think "member" just came about because I had a bugtracker account. Will ping in chat.

@vitalybuka vitalybuka merged commit bde4ffe into llvm:main Aug 13, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libfuzzer passes null pointers to functions whose parameters are marked __non_null
5 participants