Skip to content

Move interceptors for libresolv functions to MSan #119071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

aaronpuchert
Copy link
Member

The functions are not relevant for most sanitizers and only required for MSan to see which regions have been written to. This eliminates a link dependency for all other sanitizers and fixes #59007: while -lresolv had been added for the static runtime in 6dce56b, it wasn't added to the shared runtimes.

Instead of just moving the interceptors, we adapt them to MSan conventions:

  • We don't skip intercepting when msan_init_is_running is true, but directly call ENSURE_MSAN_INITED() like most other interceptors. It seems unlikely that these functions are called during initialization.
  • We don't unpoison errno, because none of the functions is specified to use it.

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' compiler-rt:msan Memory sanitizer compiler-rt:sanitizer labels Dec 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2024

@llvm/pr-subscribers-clang

Author: Aaron Puchert (aaronpuchert)

Changes

The functions are not relevant for most sanitizers and only required for MSan to see which regions have been written to. This eliminates a link dependency for all other sanitizers and fixes #59007: while -lresolv had been added for the static runtime in 6dce56b, it wasn't added to the shared runtimes.

Instead of just moving the interceptors, we adapt them to MSan conventions:

  • We don't skip intercepting when msan_init_is_running is true, but directly call ENSURE_MSAN_INITED() like most other interceptors. It seems unlikely that these functions are called during initialization.
  • We don't unpoison errno, because none of the functions is specified to use it.

Full diff: https://github.com/llvm/llvm-project/pull/119071.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) compiler-rt/lib/msan/msan_interceptors.cpp (+76)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc (-78)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h (-2)
  • (removed) compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp (-42)
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 7d3d7f8f03c491..03dbdc27975b42 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1410,7 +1410,7 @@ void tools::linkSanitizerRuntimeDeps(const ToolChain &TC,
   // libresolv.a, even if exists, is an empty archive to satisfy POSIX -lresolv
   // requirement.
   if (TC.getTriple().isOSLinux() && !TC.getTriple().isAndroid() &&
-      !TC.getTriple().isMusl())
+      !TC.getTriple().isMusl() && TC.getSanitizerArgs(Args).needsMsanRt())
     CmdArgs.push_back("-lresolv");
 }
 
diff --git a/compiler-rt/lib/msan/msan_interceptors.cpp b/compiler-rt/lib/msan/msan_interceptors.cpp
index f05c20618780b7..b2098d8a26d229 100644
--- a/compiler-rt/lib/msan/msan_interceptors.cpp
+++ b/compiler-rt/lib/msan/msan_interceptors.cpp
@@ -1358,6 +1358,79 @@ INTERCEPTOR(int, forkpty, int *aparent, char *name, const void *termp,
 #define MSAN_MAYBE_INTERCEPT_FORKPTY
 #endif
 
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+INTERCEPTOR(int, __b64_ntop, unsigned char const *src, SIZE_T srclength,
+            char *target, SIZE_T targsize) {
+  ENSURE_MSAN_INITED();
+  CHECK_UNPOISONED(src, srclength);
+  InterceptorScope interceptor_scope;
+  int res = REAL(__b64_ntop)(src, srclength, target, targsize);
+  if (res >= 0)
+    __msan_unpoison(target, res + 1);
+  return res;
+}
+INTERCEPTOR(int, __b64_pton, char const *src, char *target, SIZE_T targsize) {
+  ENSURE_MSAN_INITED();
+  CHECK_UNPOISONED(src, internal_strlen(src) + 1);
+  InterceptorScope interceptor_scope;
+  int res = REAL(__b64_pton)(src, target, targsize);
+  if (res >= 0)
+    __msan_unpoison(target, res);
+  return res;
+}
+#  define MSAN_MAYBE_INTERCEPT___B64_TO \
+    MSAN_INTERCEPT_FUNC(__b64_ntop);    \
+    COMMON_INTERCEPT_FUNCTION(__b64_pton);
+#else
+#  define MSAN_MAYBE_INTERCEPT___B64_TO
+#endif
+
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+#  if __GLIBC_PREREQ(2, 34)
+// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
+#    define DN_COMP_INTERCEPTOR_NAME dn_comp
+#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
+#  else
+#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
+#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
+#  endif
+INTERCEPTOR(int, DN_COMP_INTERCEPTOR_NAME, unsigned char *exp_dn,
+            unsigned char *comp_dn, int length, unsigned char **dnptrs,
+            unsigned char **lastdnptr) {
+  ENSURE_MSAN_INITED();
+  InterceptorScope interceptor_scope;
+  int res = REAL(DN_COMP_INTERCEPTOR_NAME)(exp_dn, comp_dn, length, dnptrs,
+                                           lastdnptr);
+  if (res >= 0) {
+    __msan_unpoison(comp_dn, res);
+    if (dnptrs && lastdnptr) {
+      unsigned char **p = dnptrs;
+      for (; p != lastdnptr && *p; ++p);
+      if (p != lastdnptr)
+        ++p;
+      __msan_unpoison(dnptrs, (p - dnptrs) * sizeof(*p));
+    }
+  }
+  return res;
+}
+INTERCEPTOR(int, DN_EXPAND_INTERCEPTOR_NAME, unsigned char const *base,
+            unsigned char const *end, unsigned char const *src, char *dest,
+            int space) {
+  ENSURE_MSAN_INITED();
+  // TODO: add read check if __dn_comp intercept added
+  InterceptorScope interceptor_scope;
+  int res = REAL(DN_EXPAND_INTERCEPTOR_NAME)(base, end, src, dest, space);
+  if (res >= 0)
+    __msan_unpoison(dest, internal_strlen(dest) + 1);
+  return res;
+}
+#  define MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND      \
+    MSAN_INTERCEPT_FUNC(DN_COMP_INTERCEPTOR_NAME); \
+    MSAN_INTERCEPT_FUNC(DN_EXPAND_INTERCEPTOR_NAME);
+#else
+#  define MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND
+#endif
+
 struct MSanInterceptorContext {
   bool in_interceptor_scope;
 };
@@ -1916,6 +1989,9 @@ void InitializeInterceptors() {
   MSAN_MAYBE_INTERCEPT_OPENPTY;
   MSAN_MAYBE_INTERCEPT_FORKPTY;
 
+  MSAN_MAYBE_INTERCEPT___B64_TO;
+  MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND;
+
   inited = 1;
 }
 } // namespace __msan
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
index ba3693dbd11f63..9cf99a192dba49 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
@@ -2538,82 +2538,6 @@ INTERCEPTOR(int, glob64, const char *pattern, int flags,
 #define INIT_GLOB64
 #endif  // SANITIZER_INTERCEPT_GLOB64
 
-#if SANITIZER_INTERCEPT___B64_TO
-INTERCEPTOR(int, __b64_ntop, unsigned char const *src, SIZE_T srclength,
-            char *target, SIZE_T targsize) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, __b64_ntop, src, srclength, target, targsize);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, src, srclength);
-  int res = REAL(__b64_ntop)(src, srclength, target, targsize);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, target, res + 1);
-  return res;
-}
-INTERCEPTOR(int, __b64_pton, char const *src, char *target, SIZE_T targsize) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, __b64_pton, src, target, targsize);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, src, internal_strlen(src) + 1);
-  int res = REAL(__b64_pton)(src, target, targsize);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, target, res);
-  return res;
-}
-#define INIT___B64_TO                      \
-    COMMON_INTERCEPT_FUNCTION(__b64_ntop); \
-    COMMON_INTERCEPT_FUNCTION(__b64_pton);
-#else  // SANITIZER_INTERCEPT___B64_TO
-#define INIT___B64_TO
-#endif  // SANITIZER_INTERCEPT___B64_TO
-
-#if SANITIZER_INTERCEPT_DN_COMP_EXPAND
-#  if __GLIBC_PREREQ(2, 34)
-// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
-#    define DN_COMP_INTERCEPTOR_NAME dn_comp
-#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
-#  else
-#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
-#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
-#  endif
-INTERCEPTOR(int, DN_COMP_INTERCEPTOR_NAME, unsigned char *exp_dn,
-            unsigned char *comp_dn, int length, unsigned char **dnptrs,
-            unsigned char **lastdnptr) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, DN_COMP_INTERCEPTOR_NAME, exp_dn, comp_dn,
-                           length, dnptrs, lastdnptr);
-  int res = REAL(DN_COMP_INTERCEPTOR_NAME)(exp_dn, comp_dn, length, dnptrs,
-                                           lastdnptr);
-  if (res >= 0) {
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, comp_dn, res);
-    if (dnptrs && lastdnptr) {
-      unsigned char **p = dnptrs;
-      for (; p != lastdnptr && *p; ++p)
-        ;
-      if (p != lastdnptr)
-        ++p;
-      COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dnptrs, (p - dnptrs) * sizeof(*p));
-    }
-  }
-  return res;
-}
-INTERCEPTOR(int, DN_EXPAND_INTERCEPTOR_NAME, unsigned char const *base,
-            unsigned char const *end, unsigned char const *src, char *dest,
-            int space) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, DN_EXPAND_INTERCEPTOR_NAME, base, end, src,
-                           dest, space);
-  // TODO: add read check if __dn_comp intercept added
-  int res = REAL(DN_EXPAND_INTERCEPTOR_NAME)(base, end, src, dest, space);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dest, internal_strlen(dest) + 1);
-  return res;
-}
-#  define INIT_DN_COMP_EXPAND                            \
-    COMMON_INTERCEPT_FUNCTION(DN_COMP_INTERCEPTOR_NAME); \
-    COMMON_INTERCEPT_FUNCTION(DN_EXPAND_INTERCEPTOR_NAME);
-#else  // SANITIZER_INTERCEPT_DN_COMP_EXPAND
-#  define INIT_DN_COMP_EXPAND
-#endif  // SANITIZER_INTERCEPT_DN_COMP_EXPAND
-
 #if SANITIZER_INTERCEPT_POSIX_SPAWN
 
 template <class RealSpawnPtr>
@@ -10346,8 +10270,6 @@ static void InitializeCommonInterceptors() {
   INIT_TIMESPEC_GET;
   INIT_GLOB;
   INIT_GLOB64;
-  INIT___B64_TO;
-  INIT_DN_COMP_EXPAND;
   INIT_POSIX_SPAWN;
   INIT_WAIT;
   INIT_WAIT4;
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
index 190cad7cf7c3f7..b4940b66da9717 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
@@ -266,8 +266,6 @@ SANITIZER_WEAK_IMPORT void *aligned_alloc(__sanitizer::usize __alignment,
 #define SANITIZER_INTERCEPT_TIMESPEC_GET SI_LINUX
 #define SANITIZER_INTERCEPT_GLOB (SI_GLIBC || SI_SOLARIS)
 #define SANITIZER_INTERCEPT_GLOB64 SI_GLIBC
-#define SANITIZER_INTERCEPT___B64_TO SI_LINUX_NOT_ANDROID
-#define SANITIZER_INTERCEPT_DN_COMP_EXPAND SI_LINUX_NOT_ANDROID
 #define SANITIZER_INTERCEPT_POSIX_SPAWN SI_POSIX
 #define SANITIZER_INTERCEPT_WAIT SI_POSIX
 #define SANITIZER_INTERCEPT_INET SI_POSIX
diff --git a/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp b/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp
deleted file mode 100644
index 96fd0138648b34..00000000000000
--- a/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp
+++ /dev/null
@@ -1,42 +0,0 @@
-// RUN: %clangxx %s -o %t && %run %t %p
-
-#include <assert.h>
-#include <resolv.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/types.h>
-
-int main(int iArgc, char *szArgv[]) {
-  // Check NTOP writing
-  const char *src = "base64 test data";
-  size_t src_len = strlen(src);
-  size_t dst_len = ((src_len + 2) / 3) * 4 + 1;
-  char dst[dst_len];
-  int res = b64_ntop(reinterpret_cast<const unsigned char *>(src), src_len, dst,
-                     dst_len);
-  assert(res >= 0);
-  assert(strcmp(dst, "YmFzZTY0IHRlc3QgZGF0YQ==") == 0);
-
-  // Check PTON writing
-  unsigned char target[dst_len];
-  res = b64_pton(dst, target, dst_len);
-  assert(res >= 0);
-  assert(strncmp(reinterpret_cast<const char *>(target), src, res) == 0);
-
-  // Check NTOP writing for zero length src
-  src = "";
-  src_len = strlen(src);
-  assert(((src_len + 2) / 3) * 4 + 1 < dst_len);
-  res = b64_ntop(reinterpret_cast<const unsigned char *>(src), src_len, dst,
-                 dst_len);
-  assert(res >= 0);
-  assert(strcmp(dst, "") == 0);
-
-  // Check PTON writing for zero length src
-  dst[0] = '\0';
-  res = b64_pton(dst, target, dst_len);
-  assert(res >= 0);
-  assert(strncmp(reinterpret_cast<const char *>(target), src, res) == 0);
-
-  return 0;
-}

@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Aaron Puchert (aaronpuchert)

Changes

The functions are not relevant for most sanitizers and only required for MSan to see which regions have been written to. This eliminates a link dependency for all other sanitizers and fixes #59007: while -lresolv had been added for the static runtime in 6dce56b, it wasn't added to the shared runtimes.

Instead of just moving the interceptors, we adapt them to MSan conventions:

  • We don't skip intercepting when msan_init_is_running is true, but directly call ENSURE_MSAN_INITED() like most other interceptors. It seems unlikely that these functions are called during initialization.
  • We don't unpoison errno, because none of the functions is specified to use it.

Full diff: https://github.com/llvm/llvm-project/pull/119071.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) compiler-rt/lib/msan/msan_interceptors.cpp (+76)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc (-78)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h (-2)
  • (removed) compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp (-42)
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 7d3d7f8f03c491..03dbdc27975b42 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1410,7 +1410,7 @@ void tools::linkSanitizerRuntimeDeps(const ToolChain &TC,
   // libresolv.a, even if exists, is an empty archive to satisfy POSIX -lresolv
   // requirement.
   if (TC.getTriple().isOSLinux() && !TC.getTriple().isAndroid() &&
-      !TC.getTriple().isMusl())
+      !TC.getTriple().isMusl() && TC.getSanitizerArgs(Args).needsMsanRt())
     CmdArgs.push_back("-lresolv");
 }
 
diff --git a/compiler-rt/lib/msan/msan_interceptors.cpp b/compiler-rt/lib/msan/msan_interceptors.cpp
index f05c20618780b7..b2098d8a26d229 100644
--- a/compiler-rt/lib/msan/msan_interceptors.cpp
+++ b/compiler-rt/lib/msan/msan_interceptors.cpp
@@ -1358,6 +1358,79 @@ INTERCEPTOR(int, forkpty, int *aparent, char *name, const void *termp,
 #define MSAN_MAYBE_INTERCEPT_FORKPTY
 #endif
 
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+INTERCEPTOR(int, __b64_ntop, unsigned char const *src, SIZE_T srclength,
+            char *target, SIZE_T targsize) {
+  ENSURE_MSAN_INITED();
+  CHECK_UNPOISONED(src, srclength);
+  InterceptorScope interceptor_scope;
+  int res = REAL(__b64_ntop)(src, srclength, target, targsize);
+  if (res >= 0)
+    __msan_unpoison(target, res + 1);
+  return res;
+}
+INTERCEPTOR(int, __b64_pton, char const *src, char *target, SIZE_T targsize) {
+  ENSURE_MSAN_INITED();
+  CHECK_UNPOISONED(src, internal_strlen(src) + 1);
+  InterceptorScope interceptor_scope;
+  int res = REAL(__b64_pton)(src, target, targsize);
+  if (res >= 0)
+    __msan_unpoison(target, res);
+  return res;
+}
+#  define MSAN_MAYBE_INTERCEPT___B64_TO \
+    MSAN_INTERCEPT_FUNC(__b64_ntop);    \
+    COMMON_INTERCEPT_FUNCTION(__b64_pton);
+#else
+#  define MSAN_MAYBE_INTERCEPT___B64_TO
+#endif
+
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+#  if __GLIBC_PREREQ(2, 34)
+// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
+#    define DN_COMP_INTERCEPTOR_NAME dn_comp
+#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
+#  else
+#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
+#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
+#  endif
+INTERCEPTOR(int, DN_COMP_INTERCEPTOR_NAME, unsigned char *exp_dn,
+            unsigned char *comp_dn, int length, unsigned char **dnptrs,
+            unsigned char **lastdnptr) {
+  ENSURE_MSAN_INITED();
+  InterceptorScope interceptor_scope;
+  int res = REAL(DN_COMP_INTERCEPTOR_NAME)(exp_dn, comp_dn, length, dnptrs,
+                                           lastdnptr);
+  if (res >= 0) {
+    __msan_unpoison(comp_dn, res);
+    if (dnptrs && lastdnptr) {
+      unsigned char **p = dnptrs;
+      for (; p != lastdnptr && *p; ++p);
+      if (p != lastdnptr)
+        ++p;
+      __msan_unpoison(dnptrs, (p - dnptrs) * sizeof(*p));
+    }
+  }
+  return res;
+}
+INTERCEPTOR(int, DN_EXPAND_INTERCEPTOR_NAME, unsigned char const *base,
+            unsigned char const *end, unsigned char const *src, char *dest,
+            int space) {
+  ENSURE_MSAN_INITED();
+  // TODO: add read check if __dn_comp intercept added
+  InterceptorScope interceptor_scope;
+  int res = REAL(DN_EXPAND_INTERCEPTOR_NAME)(base, end, src, dest, space);
+  if (res >= 0)
+    __msan_unpoison(dest, internal_strlen(dest) + 1);
+  return res;
+}
+#  define MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND      \
+    MSAN_INTERCEPT_FUNC(DN_COMP_INTERCEPTOR_NAME); \
+    MSAN_INTERCEPT_FUNC(DN_EXPAND_INTERCEPTOR_NAME);
+#else
+#  define MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND
+#endif
+
 struct MSanInterceptorContext {
   bool in_interceptor_scope;
 };
@@ -1916,6 +1989,9 @@ void InitializeInterceptors() {
   MSAN_MAYBE_INTERCEPT_OPENPTY;
   MSAN_MAYBE_INTERCEPT_FORKPTY;
 
+  MSAN_MAYBE_INTERCEPT___B64_TO;
+  MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND;
+
   inited = 1;
 }
 } // namespace __msan
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
index ba3693dbd11f63..9cf99a192dba49 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
@@ -2538,82 +2538,6 @@ INTERCEPTOR(int, glob64, const char *pattern, int flags,
 #define INIT_GLOB64
 #endif  // SANITIZER_INTERCEPT_GLOB64
 
-#if SANITIZER_INTERCEPT___B64_TO
-INTERCEPTOR(int, __b64_ntop, unsigned char const *src, SIZE_T srclength,
-            char *target, SIZE_T targsize) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, __b64_ntop, src, srclength, target, targsize);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, src, srclength);
-  int res = REAL(__b64_ntop)(src, srclength, target, targsize);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, target, res + 1);
-  return res;
-}
-INTERCEPTOR(int, __b64_pton, char const *src, char *target, SIZE_T targsize) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, __b64_pton, src, target, targsize);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, src, internal_strlen(src) + 1);
-  int res = REAL(__b64_pton)(src, target, targsize);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, target, res);
-  return res;
-}
-#define INIT___B64_TO                      \
-    COMMON_INTERCEPT_FUNCTION(__b64_ntop); \
-    COMMON_INTERCEPT_FUNCTION(__b64_pton);
-#else  // SANITIZER_INTERCEPT___B64_TO
-#define INIT___B64_TO
-#endif  // SANITIZER_INTERCEPT___B64_TO
-
-#if SANITIZER_INTERCEPT_DN_COMP_EXPAND
-#  if __GLIBC_PREREQ(2, 34)
-// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
-#    define DN_COMP_INTERCEPTOR_NAME dn_comp
-#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
-#  else
-#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
-#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
-#  endif
-INTERCEPTOR(int, DN_COMP_INTERCEPTOR_NAME, unsigned char *exp_dn,
-            unsigned char *comp_dn, int length, unsigned char **dnptrs,
-            unsigned char **lastdnptr) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, DN_COMP_INTERCEPTOR_NAME, exp_dn, comp_dn,
-                           length, dnptrs, lastdnptr);
-  int res = REAL(DN_COMP_INTERCEPTOR_NAME)(exp_dn, comp_dn, length, dnptrs,
-                                           lastdnptr);
-  if (res >= 0) {
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, comp_dn, res);
-    if (dnptrs && lastdnptr) {
-      unsigned char **p = dnptrs;
-      for (; p != lastdnptr && *p; ++p)
-        ;
-      if (p != lastdnptr)
-        ++p;
-      COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dnptrs, (p - dnptrs) * sizeof(*p));
-    }
-  }
-  return res;
-}
-INTERCEPTOR(int, DN_EXPAND_INTERCEPTOR_NAME, unsigned char const *base,
-            unsigned char const *end, unsigned char const *src, char *dest,
-            int space) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, DN_EXPAND_INTERCEPTOR_NAME, base, end, src,
-                           dest, space);
-  // TODO: add read check if __dn_comp intercept added
-  int res = REAL(DN_EXPAND_INTERCEPTOR_NAME)(base, end, src, dest, space);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dest, internal_strlen(dest) + 1);
-  return res;
-}
-#  define INIT_DN_COMP_EXPAND                            \
-    COMMON_INTERCEPT_FUNCTION(DN_COMP_INTERCEPTOR_NAME); \
-    COMMON_INTERCEPT_FUNCTION(DN_EXPAND_INTERCEPTOR_NAME);
-#else  // SANITIZER_INTERCEPT_DN_COMP_EXPAND
-#  define INIT_DN_COMP_EXPAND
-#endif  // SANITIZER_INTERCEPT_DN_COMP_EXPAND
-
 #if SANITIZER_INTERCEPT_POSIX_SPAWN
 
 template <class RealSpawnPtr>
@@ -10346,8 +10270,6 @@ static void InitializeCommonInterceptors() {
   INIT_TIMESPEC_GET;
   INIT_GLOB;
   INIT_GLOB64;
-  INIT___B64_TO;
-  INIT_DN_COMP_EXPAND;
   INIT_POSIX_SPAWN;
   INIT_WAIT;
   INIT_WAIT4;
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
index 190cad7cf7c3f7..b4940b66da9717 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
@@ -266,8 +266,6 @@ SANITIZER_WEAK_IMPORT void *aligned_alloc(__sanitizer::usize __alignment,
 #define SANITIZER_INTERCEPT_TIMESPEC_GET SI_LINUX
 #define SANITIZER_INTERCEPT_GLOB (SI_GLIBC || SI_SOLARIS)
 #define SANITIZER_INTERCEPT_GLOB64 SI_GLIBC
-#define SANITIZER_INTERCEPT___B64_TO SI_LINUX_NOT_ANDROID
-#define SANITIZER_INTERCEPT_DN_COMP_EXPAND SI_LINUX_NOT_ANDROID
 #define SANITIZER_INTERCEPT_POSIX_SPAWN SI_POSIX
 #define SANITIZER_INTERCEPT_WAIT SI_POSIX
 #define SANITIZER_INTERCEPT_INET SI_POSIX
diff --git a/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp b/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp
deleted file mode 100644
index 96fd0138648b34..00000000000000
--- a/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp
+++ /dev/null
@@ -1,42 +0,0 @@
-// RUN: %clangxx %s -o %t && %run %t %p
-
-#include <assert.h>
-#include <resolv.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/types.h>
-
-int main(int iArgc, char *szArgv[]) {
-  // Check NTOP writing
-  const char *src = "base64 test data";
-  size_t src_len = strlen(src);
-  size_t dst_len = ((src_len + 2) / 3) * 4 + 1;
-  char dst[dst_len];
-  int res = b64_ntop(reinterpret_cast<const unsigned char *>(src), src_len, dst,
-                     dst_len);
-  assert(res >= 0);
-  assert(strcmp(dst, "YmFzZTY0IHRlc3QgZGF0YQ==") == 0);
-
-  // Check PTON writing
-  unsigned char target[dst_len];
-  res = b64_pton(dst, target, dst_len);
-  assert(res >= 0);
-  assert(strncmp(reinterpret_cast<const char *>(target), src, res) == 0);
-
-  // Check NTOP writing for zero length src
-  src = "";
-  src_len = strlen(src);
-  assert(((src_len + 2) / 3) * 4 + 1 < dst_len);
-  res = b64_ntop(reinterpret_cast<const unsigned char *>(src), src_len, dst,
-                 dst_len);
-  assert(res >= 0);
-  assert(strcmp(dst, "") == 0);
-
-  // Check PTON writing for zero length src
-  dst[0] = '\0';
-  res = b64_pton(dst, target, dst_len);
-  assert(res >= 0);
-  assert(strncmp(reinterpret_cast<const char *>(target), src, res) == 0);
-
-  return 0;
-}

@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2024

@llvm/pr-subscribers-clang-driver

Author: Aaron Puchert (aaronpuchert)

Changes

The functions are not relevant for most sanitizers and only required for MSan to see which regions have been written to. This eliminates a link dependency for all other sanitizers and fixes #59007: while -lresolv had been added for the static runtime in 6dce56b, it wasn't added to the shared runtimes.

Instead of just moving the interceptors, we adapt them to MSan conventions:

  • We don't skip intercepting when msan_init_is_running is true, but directly call ENSURE_MSAN_INITED() like most other interceptors. It seems unlikely that these functions are called during initialization.
  • We don't unpoison errno, because none of the functions is specified to use it.

Full diff: https://github.com/llvm/llvm-project/pull/119071.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) compiler-rt/lib/msan/msan_interceptors.cpp (+76)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc (-78)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h (-2)
  • (removed) compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp (-42)
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 7d3d7f8f03c491..03dbdc27975b42 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1410,7 +1410,7 @@ void tools::linkSanitizerRuntimeDeps(const ToolChain &TC,
   // libresolv.a, even if exists, is an empty archive to satisfy POSIX -lresolv
   // requirement.
   if (TC.getTriple().isOSLinux() && !TC.getTriple().isAndroid() &&
-      !TC.getTriple().isMusl())
+      !TC.getTriple().isMusl() && TC.getSanitizerArgs(Args).needsMsanRt())
     CmdArgs.push_back("-lresolv");
 }
 
diff --git a/compiler-rt/lib/msan/msan_interceptors.cpp b/compiler-rt/lib/msan/msan_interceptors.cpp
index f05c20618780b7..b2098d8a26d229 100644
--- a/compiler-rt/lib/msan/msan_interceptors.cpp
+++ b/compiler-rt/lib/msan/msan_interceptors.cpp
@@ -1358,6 +1358,79 @@ INTERCEPTOR(int, forkpty, int *aparent, char *name, const void *termp,
 #define MSAN_MAYBE_INTERCEPT_FORKPTY
 #endif
 
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+INTERCEPTOR(int, __b64_ntop, unsigned char const *src, SIZE_T srclength,
+            char *target, SIZE_T targsize) {
+  ENSURE_MSAN_INITED();
+  CHECK_UNPOISONED(src, srclength);
+  InterceptorScope interceptor_scope;
+  int res = REAL(__b64_ntop)(src, srclength, target, targsize);
+  if (res >= 0)
+    __msan_unpoison(target, res + 1);
+  return res;
+}
+INTERCEPTOR(int, __b64_pton, char const *src, char *target, SIZE_T targsize) {
+  ENSURE_MSAN_INITED();
+  CHECK_UNPOISONED(src, internal_strlen(src) + 1);
+  InterceptorScope interceptor_scope;
+  int res = REAL(__b64_pton)(src, target, targsize);
+  if (res >= 0)
+    __msan_unpoison(target, res);
+  return res;
+}
+#  define MSAN_MAYBE_INTERCEPT___B64_TO \
+    MSAN_INTERCEPT_FUNC(__b64_ntop);    \
+    COMMON_INTERCEPT_FUNCTION(__b64_pton);
+#else
+#  define MSAN_MAYBE_INTERCEPT___B64_TO
+#endif
+
+#if SANITIZER_LINUX && !SANITIZER_ANDROID
+#  if __GLIBC_PREREQ(2, 34)
+// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
+#    define DN_COMP_INTERCEPTOR_NAME dn_comp
+#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
+#  else
+#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
+#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
+#  endif
+INTERCEPTOR(int, DN_COMP_INTERCEPTOR_NAME, unsigned char *exp_dn,
+            unsigned char *comp_dn, int length, unsigned char **dnptrs,
+            unsigned char **lastdnptr) {
+  ENSURE_MSAN_INITED();
+  InterceptorScope interceptor_scope;
+  int res = REAL(DN_COMP_INTERCEPTOR_NAME)(exp_dn, comp_dn, length, dnptrs,
+                                           lastdnptr);
+  if (res >= 0) {
+    __msan_unpoison(comp_dn, res);
+    if (dnptrs && lastdnptr) {
+      unsigned char **p = dnptrs;
+      for (; p != lastdnptr && *p; ++p);
+      if (p != lastdnptr)
+        ++p;
+      __msan_unpoison(dnptrs, (p - dnptrs) * sizeof(*p));
+    }
+  }
+  return res;
+}
+INTERCEPTOR(int, DN_EXPAND_INTERCEPTOR_NAME, unsigned char const *base,
+            unsigned char const *end, unsigned char const *src, char *dest,
+            int space) {
+  ENSURE_MSAN_INITED();
+  // TODO: add read check if __dn_comp intercept added
+  InterceptorScope interceptor_scope;
+  int res = REAL(DN_EXPAND_INTERCEPTOR_NAME)(base, end, src, dest, space);
+  if (res >= 0)
+    __msan_unpoison(dest, internal_strlen(dest) + 1);
+  return res;
+}
+#  define MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND      \
+    MSAN_INTERCEPT_FUNC(DN_COMP_INTERCEPTOR_NAME); \
+    MSAN_INTERCEPT_FUNC(DN_EXPAND_INTERCEPTOR_NAME);
+#else
+#  define MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND
+#endif
+
 struct MSanInterceptorContext {
   bool in_interceptor_scope;
 };
@@ -1916,6 +1989,9 @@ void InitializeInterceptors() {
   MSAN_MAYBE_INTERCEPT_OPENPTY;
   MSAN_MAYBE_INTERCEPT_FORKPTY;
 
+  MSAN_MAYBE_INTERCEPT___B64_TO;
+  MSAN_MAYBE_INTERCEPT_DN_COMP_EXPAND;
+
   inited = 1;
 }
 } // namespace __msan
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
index ba3693dbd11f63..9cf99a192dba49 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc
@@ -2538,82 +2538,6 @@ INTERCEPTOR(int, glob64, const char *pattern, int flags,
 #define INIT_GLOB64
 #endif  // SANITIZER_INTERCEPT_GLOB64
 
-#if SANITIZER_INTERCEPT___B64_TO
-INTERCEPTOR(int, __b64_ntop, unsigned char const *src, SIZE_T srclength,
-            char *target, SIZE_T targsize) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, __b64_ntop, src, srclength, target, targsize);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, src, srclength);
-  int res = REAL(__b64_ntop)(src, srclength, target, targsize);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, target, res + 1);
-  return res;
-}
-INTERCEPTOR(int, __b64_pton, char const *src, char *target, SIZE_T targsize) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, __b64_pton, src, target, targsize);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, src, internal_strlen(src) + 1);
-  int res = REAL(__b64_pton)(src, target, targsize);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, target, res);
-  return res;
-}
-#define INIT___B64_TO                      \
-    COMMON_INTERCEPT_FUNCTION(__b64_ntop); \
-    COMMON_INTERCEPT_FUNCTION(__b64_pton);
-#else  // SANITIZER_INTERCEPT___B64_TO
-#define INIT___B64_TO
-#endif  // SANITIZER_INTERCEPT___B64_TO
-
-#if SANITIZER_INTERCEPT_DN_COMP_EXPAND
-#  if __GLIBC_PREREQ(2, 34)
-// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
-#    define DN_COMP_INTERCEPTOR_NAME dn_comp
-#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
-#  else
-#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
-#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
-#  endif
-INTERCEPTOR(int, DN_COMP_INTERCEPTOR_NAME, unsigned char *exp_dn,
-            unsigned char *comp_dn, int length, unsigned char **dnptrs,
-            unsigned char **lastdnptr) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, DN_COMP_INTERCEPTOR_NAME, exp_dn, comp_dn,
-                           length, dnptrs, lastdnptr);
-  int res = REAL(DN_COMP_INTERCEPTOR_NAME)(exp_dn, comp_dn, length, dnptrs,
-                                           lastdnptr);
-  if (res >= 0) {
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, comp_dn, res);
-    if (dnptrs && lastdnptr) {
-      unsigned char **p = dnptrs;
-      for (; p != lastdnptr && *p; ++p)
-        ;
-      if (p != lastdnptr)
-        ++p;
-      COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dnptrs, (p - dnptrs) * sizeof(*p));
-    }
-  }
-  return res;
-}
-INTERCEPTOR(int, DN_EXPAND_INTERCEPTOR_NAME, unsigned char const *base,
-            unsigned char const *end, unsigned char const *src, char *dest,
-            int space) {
-  void *ctx;
-  COMMON_INTERCEPTOR_ENTER(ctx, DN_EXPAND_INTERCEPTOR_NAME, base, end, src,
-                           dest, space);
-  // TODO: add read check if __dn_comp intercept added
-  int res = REAL(DN_EXPAND_INTERCEPTOR_NAME)(base, end, src, dest, space);
-  if (res >= 0)
-    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dest, internal_strlen(dest) + 1);
-  return res;
-}
-#  define INIT_DN_COMP_EXPAND                            \
-    COMMON_INTERCEPT_FUNCTION(DN_COMP_INTERCEPTOR_NAME); \
-    COMMON_INTERCEPT_FUNCTION(DN_EXPAND_INTERCEPTOR_NAME);
-#else  // SANITIZER_INTERCEPT_DN_COMP_EXPAND
-#  define INIT_DN_COMP_EXPAND
-#endif  // SANITIZER_INTERCEPT_DN_COMP_EXPAND
-
 #if SANITIZER_INTERCEPT_POSIX_SPAWN
 
 template <class RealSpawnPtr>
@@ -10346,8 +10270,6 @@ static void InitializeCommonInterceptors() {
   INIT_TIMESPEC_GET;
   INIT_GLOB;
   INIT_GLOB64;
-  INIT___B64_TO;
-  INIT_DN_COMP_EXPAND;
   INIT_POSIX_SPAWN;
   INIT_WAIT;
   INIT_WAIT4;
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
index 190cad7cf7c3f7..b4940b66da9717 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_platform_interceptors.h
@@ -266,8 +266,6 @@ SANITIZER_WEAK_IMPORT void *aligned_alloc(__sanitizer::usize __alignment,
 #define SANITIZER_INTERCEPT_TIMESPEC_GET SI_LINUX
 #define SANITIZER_INTERCEPT_GLOB (SI_GLIBC || SI_SOLARIS)
 #define SANITIZER_INTERCEPT_GLOB64 SI_GLIBC
-#define SANITIZER_INTERCEPT___B64_TO SI_LINUX_NOT_ANDROID
-#define SANITIZER_INTERCEPT_DN_COMP_EXPAND SI_LINUX_NOT_ANDROID
 #define SANITIZER_INTERCEPT_POSIX_SPAWN SI_POSIX
 #define SANITIZER_INTERCEPT_WAIT SI_POSIX
 #define SANITIZER_INTERCEPT_INET SI_POSIX
diff --git a/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp b/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp
deleted file mode 100644
index 96fd0138648b34..00000000000000
--- a/compiler-rt/test/sanitizer_common/TestCases/Linux/b64.cpp
+++ /dev/null
@@ -1,42 +0,0 @@
-// RUN: %clangxx %s -o %t && %run %t %p
-
-#include <assert.h>
-#include <resolv.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/types.h>
-
-int main(int iArgc, char *szArgv[]) {
-  // Check NTOP writing
-  const char *src = "base64 test data";
-  size_t src_len = strlen(src);
-  size_t dst_len = ((src_len + 2) / 3) * 4 + 1;
-  char dst[dst_len];
-  int res = b64_ntop(reinterpret_cast<const unsigned char *>(src), src_len, dst,
-                     dst_len);
-  assert(res >= 0);
-  assert(strcmp(dst, "YmFzZTY0IHRlc3QgZGF0YQ==") == 0);
-
-  // Check PTON writing
-  unsigned char target[dst_len];
-  res = b64_pton(dst, target, dst_len);
-  assert(res >= 0);
-  assert(strncmp(reinterpret_cast<const char *>(target), src, res) == 0);
-
-  // Check NTOP writing for zero length src
-  src = "";
-  src_len = strlen(src);
-  assert(((src_len + 2) / 3) * 4 + 1 < dst_len);
-  res = b64_ntop(reinterpret_cast<const unsigned char *>(src), src_len, dst,
-                 dst_len);
-  assert(res >= 0);
-  assert(strcmp(dst, "") == 0);
-
-  // Check PTON writing for zero length src
-  dst[0] = '\0';
-  res = b64_pton(dst, target, dst_len);
-  assert(res >= 0);
-  assert(strncmp(reinterpret_cast<const char *>(target), src, res) == 0);
-
-  return 0;
-}

Copy link
Member Author

@aaronpuchert aaronpuchert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the removed test: there is still the MSan-specific test compiler-rt/test/msan/Linux/b64.cpp.

The functions are not relevant for most sanitizers and only required for
MSan to see which regions have been written to. This eliminates a link
dependency for all other sanitizers and fixes llvm#59007: while `-lresolv`
had been added for the static runtime in 6dce56b, it wasn't added
to the shared runtimes.

Instead of just moving the interceptors, we adapt them to MSan
conventions:
* We don't skip intercepting when `msan_init_is_running` is true, but
  directly call ENSURE_MSAN_INITED() like most other interceptors. It
  seems unlikely that these functions are called during initialization.
* We don't unpoison `errno`, because none of the functions is specified
  to use it.
@aaronpuchert aaronpuchert force-pushed the dn_comp_expand-msan-only branch from fdc60e8 to 2b0071c Compare December 7, 2024 16:08
@aaronpuchert aaronpuchert merged commit f5f9650 into llvm:main Dec 9, 2024
8 checks passed
@aaronpuchert aaronpuchert deleted the dn_comp_expand-msan-only branch December 10, 2024 13:09
@nico
Copy link
Contributor

nico commented Dec 15, 2024

This broke Linux/dn_expand.cpp on our bots, see https://issues.chromium.org/issues/384188036

nico added a commit that referenced this pull request Dec 15, 2024
This reverts commit bae383b.
Prerequisite to reverting #119071.
nico added a commit that referenced this pull request Dec 15, 2024
@nico
Copy link
Contributor

nico commented Dec 15, 2024

I reverted this in 1464b8e for now.

@aaronpuchert
Copy link
Member Author

aaronpuchert commented Dec 15, 2024

@nico, could you provide some more information? This didn't break on the lab.llvm.org infrastructure, and I'm not sure what's different in your build. Locally I also used a Release build with assertions on x86_64 on Linux.

For what it's worth, the change simply moves the interceptors, and it should still unpoison for the length of the string plus one. So I'm not sure why the last byte of the string is poisoned for you.

@aaronpuchert
Copy link
Member Author

Specifically, this is the output:

 Uninitialized bytes in strcmp at offset 0 inside [0x7fffffffe1e0, 12)
 ==1131638==WARNING: MemorySanitizer: use-of-uninitialized-value
     #0 0x55555565c159 in testWrite() /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/msan/Linux/dn_expand.cpp:20:3

The code:

  assert(res == 12);
  assert(strcmp(output, "google\\.com") == 0); // line 20
  __msan_check_mem_is_initialized(output, strlen(output) + 1);

The expected string has 11 characters, and in the old (and new) interceptor we have __msan_unpoison(dest, internal_strlen(dest) + 1);. Do you get the same result for the function? Does internal_strlen not return 11? Or does __msan_unpoison not work?

@nico
Copy link
Contributor

nico commented Dec 16, 2024

I'll try to find a Linux box tomorrow to get more details. https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8728518597999528929/+/u/package_clang/stdout?format=raw is the full build output. If you ctrl-f for "cmake -G", you should find all cmake vars that that bot sets. (It first builds a bootstrap compiler, then an instrumented compiler, then a final compiler, then compiler-rt with that.)

@nico
Copy link
Contributor

nico commented Dec 17, 2024

@alanzhao1 has been trying to reproduce the problem, I think. I'm not sure what the current status is.

@alanzhao1
Copy link
Contributor

alanzhao1 commented Dec 18, 2024

Managed to reproduce it. This seems to break if the sysroot is set to the one Chrome uses. Our sysroot can be downloaded at https://commondatastorage.googleapis.com/chrome-linux-sysroot/dec7a3a0fc5b83b909cba1b6d119077e0429a138eadef6bf5a0f2e03b1904631

Command I used:

$ cmake -DLLVM_ENABLE_LLD=ON -DCMAKE_USE_LINKER=/usr/bin/lld  -DCMAKE_C_COMPILER=/usr/bin/clang -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DLLVM_ENABLE_ZSTD=OFF -DLLVM_TARGETS_TO_BUILD='X86' -DLLVM_ENABLE_RUNTIMES='compiler-rt' -DCOMPILER_RT_BUILD_SANITIZERS=ON -DCOMPILER_RT_SANITIZERS_TO_BUILD='asan;dfsan;msan;hwasan;tsan;cfi' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='clang' -DLLVM_ENABLE_PIC=OFF -DCMAKE_SYSROOT='/path/to/expanded/tarball' -DRUNTIMES_i386-unknown-linux-gnu_CMAKE_SYSROOT=/usr/local/google/home/ayzhao/src/chromium/src/third_party/llvm-build-tools/debian_bullseye_i386_sysroot -DBUILTINS_i386-unknown-linux-gnu_CMAKE_SYSROOT=/usr/local/google/home/ayzhao/src/chromium/src/third_party/llvm-build-tools/debian_bullseye_i386_sysroot -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON -G Ninja ../llvm

$ ninja check-runtimes

@aaronpuchert
Copy link
Member Author

Where do I find these?

-DRUNTIMES_i386-unknown-linux-gnu_CMAKE_SYSROOT=/usr/local/google/home/ayzhao/src/chromium/src/third_party/llvm-build-tools/debian_bullseye_i386_sysroot \
-DBUILTINS_i386-unknown-linux-gnu_CMAKE_SYSROOT=/usr/local/google/home/ayzhao/src/chromium/src/third_party/llvm-build-tools/debian_bullseye_i386_sysroot

Or are they not relevant, because we're actually interested in 64-bit x86?

@aaronpuchert
Copy link
Member Author

Without these two options, the build fails for me:

FAILED: bin/llvm-jitlink
: && /usr/bin/clang++ --sysroot=/home/aaron/chrome-sysroot -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fno-pie -fuse-ld=lld -Wl,--color-diagnostics   -Wl,--export-dynamic  -Wl,--gc-sections tools/llvm-jitlink/CMakeFiles/llvm-jitlink.dir/llvm-jitlink.cpp.o tools/llvm-jitlink/CMakeFiles/llvm-jitlink.dir/llvm-jitlink-coff.cpp.o tools/llvm-jitlink/CMakeFiles/llvm-jitlink.dir/llvm-jitlink-elf.cpp.o tools/llvm-jitlink/CMakeFiles/llvm-jitlink.dir/llvm-jitlink-macho.cpp.o tools/llvm-jitlink/CMakeFiles/llvm-jitlink.dir/llvm-jitlink-statistics.cpp.o -o bin/llvm-jitlink  -Wl,-rpath,"\$ORIGIN/../lib:"  [...]  -ldl  -lpthread  -lm  /home/aaron/chrome-sysroot/usr/lib/x86_64-linux-gnu/libz.so  lib/libLLVMDemangle.a && :
ld.lld: error: undefined symbol: shm_open
>>> referenced by MemoryMapper.cpp
>>>               MemoryMapper.cpp.o:(llvm::orc::SharedMemoryMapper::reserve(unsigned long, llvm::unique_function<void (llvm::Expected<llvm::orc::ExecutorAddrRange>)>)::$_0::operator()(llvm::Error, llvm::Expected<std::pair<llvm::orc::ExecutorAddr, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>)) in archive lib/libLLVMOrcJIT.a
>>> did you mean: sem_open
>>> defined in: /home/aaron/chrome-sysroot/usr/lib/x86_64-linux-gnu/libpthread.so

ld.lld: error: undefined symbol: shm_unlink
>>> referenced by MemoryMapper.cpp
>>>               MemoryMapper.cpp.o:(llvm::orc::SharedMemoryMapper::reserve(unsigned long, llvm::unique_function<void (llvm::Expected<llvm::orc::ExecutorAddrRange>)>)::$_0::operator()(llvm::Error, llvm::Expected<std::pair<llvm::orc::ExecutorAddr, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>)) in archive lib/libLLVMOrcJIT.a
>>> did you mean: sem_unlink
>>> defined in: /home/aaron/chrome-sysroot/usr/lib/x86_64-linux-gnu/libpthread.so
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

Apparently we're missing -lrt, which has the functions:

> readelf --dyn-syms --wide  /home/aaron/chrome-sysroot/usr/lib/x86_64-linux-gnu/librt.so | grep shm
    55: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __shm_directory@GLIBC_PRIVATE (12)
    73: 0000000000004640   353 FUNC    GLOBAL DEFAULT   14 shm_open@@GLIBC_2.2.5
    91: 00000000000047b0   303 FUNC    GLOBAL DEFAULT   14 shm_unlink@@GLIBC_2.2.5

Why are we not linking with that? It's correctly detected:

-- Looking for shm_open in rt
-- Looking for shm_open in rt - found

Have to investigate. Without sysroot but otherwise the same command line I can't reproduce it.

@aaronpuchert
Copy link
Member Author

That configuration output is from compiler-rt, but we need it in llvm. In llvm/lib/ExecutionEngine/Orc/CMakeLists.txt:

if( CMAKE_HOST_UNIX AND HAVE_LIBRT )
  set(rt_lib rt)
endif()

We have CMAKE_HOST_UNIX = 1, but HAVE_LIBRT is empty. It comes from check_library_exists(rt clock_gettime "" HAVE_LIBRT) in llvm/cmake/config-ix.cmake. The corresponding output:

-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - not found

This looks correct to me:

> readelf --dyn-syms --wide /home/aaron/chrome-sysroot/usr/lib/x86_64-linux-gnu/librt.so | grep clock_gettime
    38: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __clock_gettime@GLIBC_PRIVATE (7)

Instead it's in libc.so.6:

> readelf --dyn-syms --wide /home/aaron/chrome-sysroot/lib/x86_64-linux-gnu/libc.so.6 | grep clock_gettime
   840: 00000000000c3420   111 FUNC    GLOBAL DEFAULT   13 __clock_gettime@@GLIBC_PRIVATE
  1748: 00000000000c3420   111 FUNC    GLOBAL DEFAULT   13 clock_gettime@GLIBC_2.2.5
  1751: 00000000000c3420   111 FUNC    GLOBAL DEFAULT   13 clock_gettime@@GLIBC_2.17

On my system that's not an issue because libc.so has both clock_gettime and shm_open, so librt.so is never needed. Do you have any additional patches or why does this work for you?

@aaronpuchert
Copy link
Member Author

Anyway, I managed to continue the build by setting HAVE_LIBRT manually to 1. The problem is that we skip the interceptor and directly go into the actual implementation:

#0  0x00007ffff794d6e0 in dn_expand@@GLIBC_2.34 () from /lib64/libc.so.6
#1  0x000055555562073c in testWrite () at [...]/compiler-rt/test/msan/Linux/dn_expand.cpp:16
#2  0x000055555562187f in main (iArgc=2, szArgv=0x7fffffffe1f8) at [...]/compiler-rt/test/msan/Linux/dn_expand.cpp:86

Because the interceptor has a different name:

> readelf --syms --wide msan_interceptors.cpp.o | grep dn_expand
   253: 0000000000000000     0 SECTION LOCAL  DEFAULT  450 .text.___interceptor___dn_expand
  3520: 00000000000003e3     5 FUNC    WEAK   DEFAULT    2 __dn_expand
  3521: 00000000000003e3     5 FUNC    GLOBAL DEFAULT    2 __interceptor_trampoline___dn_expand
  3522: 0000000000000000   226 FUNC    WEAK   DEFAULT  450 __interceptor___dn_expand
  5383: 0000000000000000   226 FUNC    GLOBAL DEFAULT  450 ___interceptor___dn_expand
  5384: 0000000000000000     8 OBJECT  GLOBAL HIDDEN  2624 _ZN14__interception16real___dn_expandE

We choose the interceptor name based on the libc version:

#  if __GLIBC_PREREQ(2, 34)
// Changed with https://sourceware.org/git/?p=glibc.git;h=640bbdf
#    define DN_COMP_INTERCEPTOR_NAME dn_comp
#    define DN_EXPAND_INTERCEPTOR_NAME dn_expand
#  else
#    define DN_COMP_INTERCEPTOR_NAME __dn_comp
#    define DN_EXPAND_INTERCEPTOR_NAME __dn_expand
#  endif

Indeed the symbol is dn_expand@@GLIBC_2.34. However:

> /home/aaron/chrome-sysroot/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 /home/aaron/chrome-sysroot/lib/x86_64-linux-gnu/libc.so.6 -v
GNU C Library (Debian GLIBC 2.31-13+deb11u5) stable release version 2.31.
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 10.2.1 20210110.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.

Strangely, /usr/include/features.h has:

/* Major and minor version number of the GNU C library package.  Use
   these macros to test for features in specific releases.  */
#define __GLIBC__       2
#define __GLIBC_MINOR__ 26 //   31

#define __GLIBC_PREREQ(maj, min) \
        ((__GLIBC__ << 16) + __GLIBC_MINOR__ >= ((maj) << 16) + (min))

So the problem is that we build against libc 2.26/2.31 and run with 2.34 or newer. At least that's what happens here. If I run the build and run command lines in the sysroot, the test succeeds:

> [...]/bin/clang --driver-mode=g++ -fsanitize=memory -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -gline-tables-only [...]/compiler-rt/test/msan/Linux/dn_expand.cpp --sysroot=/home/aaron/chrome-sysroot/ -o temp
> LD_LIBRARY_PATH=/home/aaron/chrome-sysroot/lib/x86_64-linux-gnu:/home/aaron/chrome-sysroot/usr/lib/x86_64-linux-gnu /home/aaron/chrome-sysroot/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ./temp
10
> echo $?
0

Now I'll try without my change, because I simply moved that #if from one place to another, where it should do the same thing.

However, independent of this change I suppose we should interpose both spellings, because an application can be compiled against a different libc version than compiler-rt.

@aaronpuchert
Copy link
Member Author

With my change reverted, the test lives in compiler-rt/test/sanitizer_common, where we get an additional --sysroot= argument. This comes from compiler-rt/test/sanitizer_common/lit.site.cfg.py.in:

# Tool-specific config options.
config.name_suffix = "@CONFIG_NAME@"
config.tool_name = "@SANITIZER_COMMON_LIT_TEST_MODE@"
config.target_cflags = "@SANITIZER_COMMON_TEST_TARGET_CFLAGS@"
config.target_arch = "@SANITIZER_COMMON_TEST_TARGET_ARCH@"

In this case configured to compiler-rt/test/sanitizer_common/msan-x86_64-Linux/lit.site.cfg.py:

# Tool-specific config options.
config.name_suffix = "msan-x86_64-Linux"
config.tool_name = "msan"
config.target_cflags = "-funwind-tables --sysroot=/home/aaron/chrome-sysroot"
config.target_arch = "x86_64"

This comes from compiler-rt/test/sanitizer_common/CMakeLists.txt:

      if(CMAKE_SYSROOT)
        list(APPEND SANITIZER_COMMON_TEST_TARGET_CFLAGS "--sysroot=${CMAKE_SYSROOT}")
      endif()

However, in compiler-rt/test/msan/lit.site.cfg.py.in:

# Tool-specific config options.
config.name_suffix = "-@CONFIG_NAME@"
config.target_cflags = "@MSAN_TEST_TARGET_CFLAGS@"
config.target_arch = "@MSAN_TEST_TARGET_ARCH@"
config.use_lld = @MSAN_TEST_USE_LLD@
config.use_thinlto = @MSAN_TEST_USE_THINLTO@

Configured to e.g. compiler-rt/test/msan/X86_64/lit.site.cfg.py:

# Tool-specific config options.
config.name_suffix = "-X86_64"
config.target_cflags = ""
config.target_arch = "x86_64"
config.use_lld = False
config.use_thinlto = False

This comes from compiler-rt/test/msan/CMakeLists.txt:

get_test_cc_for_arch(${arch} MSAN_TEST_TARGET_CC MSAN_TEST_TARGET_CFLAGS)

which in turn goes to compiler-rt/cmake/config-ix.cmake:

# Returns a compiler and CFLAGS that should be used to run tests for the
# specific architecture.  When cross-compiling, this is controled via
# COMPILER_RT_TEST_COMPILER and COMPILER_RT_TEST_COMPILER_CFLAGS.
macro(get_test_cc_for_arch arch cc_out cflags_out)
  if (NOT ${ARGC} EQUAL 3)
    message(FATAL_ERROR "got too many args. expected 3, got ${ARGC} (namely: ${ARGV})")
  endif()
  if(ANDROID OR (NOT APPLE AND ${arch} MATCHES "arm|aarch64|riscv32|riscv64"))
    # This is only true if we are cross-compiling.
    # Build all tests with host compiler and use host tools.
    set(${cc_out} ${COMPILER_RT_TEST_COMPILER})
    set(${cflags_out} ${COMPILER_RT_TEST_COMPILER_CFLAGS})
  else()
    get_target_flags_for_arch(${arch} ${cflags_out})
    if(APPLE)
      list(APPEND ${cflags_out} ${DARWIN_osx_CFLAGS})
    endif()
    string(REPLACE ";" " " ${cflags_out} "${${cflags_out}}")
  endif()
endmacro()

@vitalybuka, should we move the sysroot option into this macro? And always interpose both spellings?

aaronpuchert added a commit to aaronpuchert/llvm-project that referenced this pull request Dec 20, 2024
Sanitizer-specific tests don't use the sanitizer_common flags, but the
issues they address probably also apply to the individual sanitizers.

This was observed in llvm#119071: moving a test from sanitizer_common to
msan broke it in builds with CMAKE_SYSROOT set, because the --sysroot
argument was no longer applied to the test.
aaronpuchert added a commit to aaronpuchert/llvm-project that referenced this pull request May 7, 2025
Sanitizer-specific tests don't use the sanitizer_common flags, but the
issues they address probably also apply to the individual sanitizers.

This was observed in llvm#119071: moving a test from sanitizer_common to
msan broke it in builds with CMAKE_SYSROOT set, because the --sysroot
argument was no longer applied to the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category compiler-rt:msan Memory sanitizer compiler-rt:sanitizer compiler-rt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AddressSanitizer:DEADLYSIGNAL in getaddrinfo() when compiled with clang-15 with "-nodefaultlibs" without "-lresolv"
5 participants