Skip to content

[ubsan] Display correct runtime messages for negative _BitInt #93612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

earnol
Copy link
Contributor

@earnol earnol commented May 28, 2024

Without this patch compiler-rt ubsan library has a bug displaying incorrect values for variables of the _BitInt (previously called _ExtInt) type. This patch affects affects both: generation of metadata inside code generator and runtime part. The runtime part provided only for i386 and x86_64 runtimes. Other runtimes should be updated to take full benefit of this patch.
The patch is constructed the way to be backward compatible and int and float type runtime diagnostics should be unaffected for not yet updated runtimes.

This patch fixes issue #64100.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt clang:codegen IR generation bugs: mangling, exceptions, etc. compiler-rt:ubsan Undefined behavior sanitizer compiler-rt:sanitizer labels May 28, 2024
@llvmbot
Copy link
Member

llvmbot commented May 28, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

@llvm/pr-subscribers-clang

Author: None (earnol)

Changes

Without this patch compiler-rt ubsan library has a bug displaying incorrect values for variables of the _BitInt (previously called _ExtInt) type. This patch affects affects both: generation of metadata inside code generator and runtime part. The runtime part provided only for i386 and x86_64 runtimes. Other runtimes should be updated to take full benefit of this patch.
The patch is constructed the way to be backward compatible and int and float type runtime diagnostics should be unaffected for not yet updated runtimes.

This patch fixes issue #64100.


Full diff: https://github.com/llvm/llvm-project/pull/93612.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+51-5)
  • (modified) compiler-rt/lib/ubsan/ubsan_value.cpp (+10-7)
  • (modified) compiler-rt/lib/ubsan/ubsan_value.h (+34-1)
  • (added) compiler-rt/test/ubsan/TestCases/Integer/bit-int.c (+188)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d6478cc6835d8..e80bdfd51ceec 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -41,6 +41,7 @@
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/Passes/OptimizationLevel.h"
 #include "llvm/Support/ConvertUTF.h"
+#include "llvm/Support/Endian.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/SaveAndRestore.h"
@@ -64,6 +65,20 @@ static llvm::cl::opt<bool> ClSanitizeGuardChecks(
     "ubsan-guard-checks", llvm::cl::Optional,
     llvm::cl::desc("Guard UBSAN checks with `llvm.allow.ubsan.check()`."));
 
+//===--------------------------------------------------------------------===//
+//                        Defines for metadata
+//===--------------------------------------------------------------------===//
+enum VariableTypeDescriptorKind : uint16_t {
+  /// An integer type.
+  TK_Integer = 0x0000,
+  /// A floating-point type.
+  TK_Float = 0x0001,
+  /// An _BitInt(N) type.
+  TK_BitInt = 0x0002,
+  /// Any other type. The value representation is unspecified.
+  TK_Unknown = 0xffff
+};
+
 //===--------------------------------------------------------------------===//
 //                        Miscellaneous Helper Methods
 //===--------------------------------------------------------------------===//
@@ -3292,22 +3307,39 @@ LValue CodeGenFunction::EmitPredefinedLValue(const PredefinedExpr *E) {
 ///   { i16 TypeKind, i16 TypeInfo }
 /// \endcode
 ///
-/// followed by an array of i8 containing the type name. TypeKind is 0 for an
-/// integer, 1 for a floating point value, and -1 for anything else.
+/// followed by an array of i8 containing the type name with extra information
+/// for BitInt. TypeKind is 0 for an integer, 1 for a floating point value, 2
+/// for BitInt and -1 for anything else.
 llvm::Constant *CodeGenFunction::EmitCheckTypeDescriptor(QualType T) {
   // Only emit each type's descriptor once.
   if (llvm::Constant *C = CGM.getTypeDescriptorFromMap(T))
     return C;
 
-  uint16_t TypeKind = -1;
+  uint16_t TypeKind = TK_Unknown;
   uint16_t TypeInfo = 0;
+  bool IsBitInt = false;
 
   if (T->isIntegerType()) {
-    TypeKind = 0;
+    TypeKind = TK_Integer;
     TypeInfo = (llvm::Log2_32(getContext().getTypeSize(T)) << 1) |
                (T->isSignedIntegerType() ? 1 : 0);
+    // Follow suggestion from https://github.com/llvm/llvm-project/issues/64100
+    // So we can write the exact amount of bits in TypeName after '\0'
+    // making it <diagnostic-like type name>.'\0'.<32-bit width>.
+    if (T->isSignedIntegerType() && T->getAs<BitIntType>()) {
+      // Do a sanity checks as we are using 32-bit type to store bit length.
+      assert((getContext().getTypeSize(T) > 0) &&
+             " non positive amount of bits in __BitInt type");
+      assert((getContext().getTypeSize(T) <= 0xFFFFFFFF) &&
+             " too many bits in __BitInt type");
+
+      // Redefine TypeKind with the actual __BitInt type if we have signed
+      // BitInt.
+      TypeKind = TK_BitInt;
+      IsBitInt = true;
+    }
   } else if (T->isFloatingType()) {
-    TypeKind = 1;
+    TypeKind = TK_Float;
     TypeInfo = getContext().getTypeSize(T);
   }
 
@@ -3318,6 +3350,20 @@ llvm::Constant *CodeGenFunction::EmitCheckTypeDescriptor(QualType T) {
       DiagnosticsEngine::ak_qualtype, (intptr_t)T.getAsOpaquePtr(), StringRef(),
       StringRef(), std::nullopt, Buffer, std::nullopt);
 
+  if (IsBitInt) {
+    // The Structure is: 0 to end the string, 32 bit insigned integer in target
+    // endianness, zero.
+    char s[6] = {'\0', '\0', '\0', '\0', '\0', '\0'};
+    const auto *EIT = T->getAs<BitIntType>();
+    uint32_t Bits = EIT->getNumBits();
+    llvm::support::endian::write32(s + 1, Bits,
+                                   getTarget().isBigEndian()
+                                       ? llvm::endianness::big
+                                       : llvm::endianness::little);
+    StringRef str = StringRef(s, 6);
+    Buffer.append(str);
+  }
+
   llvm::Constant *Components[] = {
     Builder.getInt16(TypeKind), Builder.getInt16(TypeInfo),
     llvm::ConstantDataArray::getString(getLLVMContext(), Buffer)
diff --git a/compiler-rt/lib/ubsan/ubsan_value.cpp b/compiler-rt/lib/ubsan/ubsan_value.cpp
index dc61e5b939d95..6e88ebaf34d4b 100644
--- a/compiler-rt/lib/ubsan/ubsan_value.cpp
+++ b/compiler-rt/lib/ubsan/ubsan_value.cpp
@@ -67,18 +67,21 @@ const char *__ubsan::getObjCClassName(ValueHandle Pointer) {
 
 SIntMax Value::getSIntValue() const {
   CHECK(getType().isSignedIntegerTy());
+  // Val was zero-extended to ValueHandle. Sign-extend from original width
+  // to SIntMax.
+  const unsigned ExtraBits =
+      sizeof(SIntMax) * 8 - getType().getIntegerBitCount();
   if (isInlineInt()) {
-    // Val was zero-extended to ValueHandle. Sign-extend from original width
-    // to SIntMax.
-    const unsigned ExtraBits =
-      sizeof(SIntMax) * 8 - getType().getIntegerBitWidth();
     return SIntMax(UIntMax(Val) << ExtraBits) >> ExtraBits;
   }
-  if (getType().getIntegerBitWidth() == 64)
-    return *reinterpret_cast<s64*>(Val);
+  if (getType().getIntegerBitWidth() == 64) {
+    return SIntMax(UIntMax(*reinterpret_cast<s64 *>(Val)) << ExtraBits) >>
+           ExtraBits;
+  }
 #if HAVE_INT128_T
   if (getType().getIntegerBitWidth() == 128)
-    return *reinterpret_cast<s128*>(Val);
+    return SIntMax(UIntMax(*reinterpret_cast<s128 *>(Val)) << ExtraBits) >>
+           ExtraBits;
 #else
   if (getType().getIntegerBitWidth() == 128)
     UNREACHABLE("libclang_rt.ubsan was built without __int128 support");
diff --git a/compiler-rt/lib/ubsan/ubsan_value.h b/compiler-rt/lib/ubsan/ubsan_value.h
index e0957276dd241..acef362718204 100644
--- a/compiler-rt/lib/ubsan/ubsan_value.h
+++ b/compiler-rt/lib/ubsan/ubsan_value.h
@@ -103,6 +103,14 @@ class TypeDescriptor {
     /// representation is that of bitcasting the floating-point value to an
     /// integer type.
     TK_Float = 0x0001,
+    /// An _BitInt(N) type. Lowest bit is 1 for a signed value, 0 for an
+    /// unsigned
+    /// value. Remaining bits are log_2(bit_width). The value representation is
+    /// the integer itself if it fits into a ValueHandle, and a pointer to the
+    /// integer otherwise. TypeName contains the true width of the type for the
+    /// signed _BitInt(N) type stored after zero bit after TypeName as 32-bit
+    //  unsigned integer.
+    TK_BitInt = 0x0002,
     /// Any other type. The value representation is unspecified.
     TK_Unknown = 0xffff
   };
@@ -113,10 +121,15 @@ class TypeDescriptor {
     return static_cast<Kind>(TypeKind);
   }
 
-  bool isIntegerTy() const { return getKind() == TK_Integer; }
+  bool isIntegerTy() const {
+    return getKind() == TK_Integer || getKind() == TK_BitInt;
+  }
+  bool isBitIntTy() const { return getKind() == TK_BitInt; }
+
   bool isSignedIntegerTy() const {
     return isIntegerTy() && (TypeInfo & 1);
   }
+  bool isSignedBitIntTy() const { return isBitIntTy() && (TypeInfo & 1); }
   bool isUnsignedIntegerTy() const {
     return isIntegerTy() && !(TypeInfo & 1);
   }
@@ -125,6 +138,26 @@ class TypeDescriptor {
     return 1 << (TypeInfo >> 1);
   }
 
+  const char *getBitIntBitCountPointer() const {
+    CHECK(isBitIntTy());
+    CHECK(isSignedBitIntTy());
+    // Scan Name for zero and return the next address
+    const char *p = getTypeName();
+    while (*p != '\0') {
+      p++;
+    }
+    // Return the next address
+    return p + 1;
+  }
+
+  unsigned getIntegerBitCount() const {
+    CHECK(isIntegerTy());
+    if (isSignedBitIntTy())
+      return *reinterpret_cast<const u32 *>(getBitIntBitCountPointer());
+    else
+      return getIntegerBitWidth();
+  }
+
   bool isFloatTy() const { return getKind() == TK_Float; }
   unsigned getFloatBitWidth() const {
     CHECK(isFloatTy());
diff --git a/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c b/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c
new file mode 100644
index 0000000000000..9ec096a0cf20d
--- /dev/null
+++ b/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c
@@ -0,0 +1,188 @@
+// RUN: %clang -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -O0 -fsanitize=alignment,array-bounds,bool,float-cast-overflow,implicit-integer-sign-change,implicit-signed-integer-truncation,implicit-unsigned-integer-truncation,integer-divide-by-zero,nonnull-attribute,null,nullability-arg,nullability-assign,nullability-return,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,unsigned-integer-overflow,unsigned-shift-base,vla-bound %s -o %t1 && %run %t1 2>&1 | FileCheck %s --check-prefix=CHECK-R
+// RUN: %clang -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -fsanitize=array-bounds,enum,float-cast-overflow,integer-divide-by-zero,implicit-unsigned-integer-truncation,implicit-signed-integer-truncation,implicit-integer-sign-change,unsigned-integer-overflow,signed-integer-overflow,shift-base,shift-exponent -O0 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-IR
+
+#include <stdint.h>
+#include <stdio.h>
+
+uint32_t float_divide_by_zero() {
+  float f = 1.0f / 0.0f;
+  // CHECK-IR: constant { i16, i16, [8 x i8] } { i16 1, i16 32, [8 x i8] c"'float'\00" }
+  _BitInt(37) r = (_BitInt(37))f;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:19: runtime error: inf is outside the range of representable values of type
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 13, [20 x i8] c"'_BitInt(37)'\00%\00\00\00\00\00" }
+  return r;
+}
+
+uint32_t integer_divide_by_zero() __attribute__((no_sanitize("memory"))) {
+  _BitInt(37) x = 1 / 0;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:21: runtime error: division by zero
+  // CHECK-IR: constant { i16, i16, [32 x i8] } { i16 0, i16 10, [32 x i8] c"'uint32_t' (aka 'unsigned int')\00" }
+  return x;
+}
+
+uint32_t implicit_unsigned_integer_truncation() {
+  unsigned _BitInt(37) x = 0U;
+  x += float_divide_by_zero();
+  x += integer_divide_by_zero();
+  x += ~0ULL;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:5: runtime error: unsigned integer overflow:
+  // CHECK-IR: constant { i16, i16, [23 x i8] } { i16 0, i16 12, [23 x i8] c"'unsigned _BitInt(37)'\00" }
+  uint32_t r = x;
+  return r;
+}
+
+uint32_t pointer_overflow() __attribute__((no_sanitize("address"))) {
+  _BitInt(37) *x = (_BitInt(37) *)1;
+  _BitInt(37) *y = x - 1;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:22: runtime error: pointer index expression with base
+  uint32_t r = *(_BitInt(37) *)&y;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:16: runtime error: implicit conversion from type
+  return r;
+}
+
+uint32_t vla_bound(_BitInt(37) x) {
+  _BitInt(37) a[x - 1];
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:17: runtime error: variable length array bound evaluates to non-positive value
+  return 0;
+}
+
+uint32_t nullability_arg(_BitInt(37) *_Nonnull x)
+    __attribute__((no_sanitize("address"))) {
+  _BitInt(37) y = *(_BitInt(37) *)&x;
+  return y;
+}
+
+uint32_t unsigned_shift_base() {
+  unsigned _BitInt(37) x = ~0U << 1;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:32: runtime error: left shift of 4294967295 by 1 places cannot be represented in type
+  return x;
+}
+
+uint32_t array_bounds() {
+  _BitInt(37) x[4];
+  _BitInt(37) y = x[10];
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:19: runtime error: index 10 out of bounds for type
+  // CHECK-IR: constant { i16, i16, [17 x i8] } { i16 -1, i16 0, [17 x i8] c"'_BitInt(37)[4]'\00" }
+  return (uint32_t)y;
+}
+
+uint32_t float_cast_overflow() {
+  float a = 100000000.0f;
+  _BitInt(7) b = (_BitInt(7))a;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:18: runtime error: 1e+08 is outside the range of representable values of type
+  // CHECK-IR: constant { i16, i16, [19 x i8] } { i16 2, i16 7, [19 x i8] c"'_BitInt(7)'\00\07\00\00\00\00\00" }
+  return b;
+}
+
+uint32_t implicit_integer_sign_change(unsigned _BitInt(37) x) {
+  _BitInt(37) r = x;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:19: runtime error: implicit conversion from type '{{[^']+}}' of value
+  return r & 0xFFFFFFFF;
+}
+
+_BitInt(13) implicit_signed_integer_truncation() {
+  _BitInt(73) x = (_BitInt(73)) ~((~0UL) >> 1);
+  return x;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:10: runtime error: implicit conversion from type
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 {{([[:xdigit:]]{2})}}, [20 x i8] c"'_BitInt(73)'\00I\00\00\00\00\00" }
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 9, [20 x i8] c"'_BitInt(13)'\00\0D\00\00\00\00\00" }
+}
+
+_BitInt(37) nonnull_attribute(__attribute__((nonnull)) _BitInt(37) * x)
+    __attribute__((no_sanitize("address"))) {
+  return *(_BitInt(37) *)&x;
+}
+
+uint32_t nullability_assign(_BitInt(37) * x)
+    __attribute__((no_sanitize("address"))) {
+  _BitInt(37) *_Nonnull y = x;
+  uint32_t r = *(_BitInt(37) *)&y;
+  return r;
+}
+
+_BitInt(37) shift_exponent() {
+  _BitInt(37) x = 1 << (-1);
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:21: runtime error: shift exponent -1 is negative
+  return x;
+}
+
+_BitInt(37) shift_base() {
+  _BitInt(37) x = (-1) << 1;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:24: runtime error: left shift of negative value -1
+  return x;
+}
+
+uint32_t negative_shift1(unsigned _BitInt(37) x) {
+  _BitInt(9) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [19 x i8] } { i16 2, i16 9, [19 x i8] c"'_BitInt(9)'\00\09\00\00\00\00\00" }
+}
+
+uint32_t negative_shift2(unsigned _BitInt(37) x) {
+  _BitInt(17) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 11, [20 x i8] c"'_BitInt(17)'\00\11\00\00\00\00\00" }
+}
+
+uint32_t negative_shift3(unsigned _BitInt(37) x) {
+  _BitInt(34) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 13, [20 x i8] c"'_BitInt(34)'\00\22\00\00\00\00\00" }
+}
+
+uint32_t negative_shift4(unsigned _BitInt(37) x) {
+  int64_t c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+}
+
+uint32_t negative_shift5(unsigned _BitInt(37) x) {
+  _BitInt(68) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 {{([[:xdigit:]]{2})}}, [20 x i8] c"'_BitInt(68)'\00D\00\00\00\00\00" }
+}
+
+uint32_t unsigned_integer_overflow() {
+  unsigned _BitInt(37) x = ~0U;
+  x++;
+  return x;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:10: runtime error: implicit conversion from type
+}
+
+uint32_t signed_integer_overflow() {
+  _BitInt(37) x = (_BitInt(37)) ~((~0U) >> 1);
+  x--;
+  return x;
+}
+
+int main(int argc, char **argv) {
+  // clang-format off
+  uint64_t result =
+      1ULL +
+      implicit_unsigned_integer_truncation() +
+      pointer_overflow() +
+      vla_bound(argc) +
+      nullability_arg((_BitInt(37) *)argc) +
+      unsigned_shift_base() +
+      (uint32_t)array_bounds() +
+      float_cast_overflow() +
+      implicit_integer_sign_change((unsigned _BitInt(37))(argc - 2)) +
+      (uint64_t)implicit_signed_integer_truncation() +
+      ((uint64_t)nonnull_attribute((_BitInt(37) *)argc) & 0xFFFFFFFF) +
+      nullability_assign((_BitInt(37) *)argc) +
+      shift_exponent() +
+      (uint32_t)shift_base() +
+      negative_shift1(5) +
+      negative_shift2(5) +
+      negative_shift3(5) +
+      negative_shift4(5) +
+      negative_shift5(5) +
+      unsigned_integer_overflow() +
+      signed_integer_overflow();
+  // clang-format on
+  printf("%u\n", (uint32_t)(result & 0xFFFFFFFF));
+}

@llvmbot
Copy link
Member

llvmbot commented May 28, 2024

@llvm/pr-subscribers-clang-codegen

Author: None (earnol)

Changes

Without this patch compiler-rt ubsan library has a bug displaying incorrect values for variables of the _BitInt (previously called _ExtInt) type. This patch affects affects both: generation of metadata inside code generator and runtime part. The runtime part provided only for i386 and x86_64 runtimes. Other runtimes should be updated to take full benefit of this patch.
The patch is constructed the way to be backward compatible and int and float type runtime diagnostics should be unaffected for not yet updated runtimes.

This patch fixes issue #64100.


Full diff: https://github.com/llvm/llvm-project/pull/93612.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+51-5)
  • (modified) compiler-rt/lib/ubsan/ubsan_value.cpp (+10-7)
  • (modified) compiler-rt/lib/ubsan/ubsan_value.h (+34-1)
  • (added) compiler-rt/test/ubsan/TestCases/Integer/bit-int.c (+188)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d6478cc6835d8..e80bdfd51ceec 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -41,6 +41,7 @@
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/Passes/OptimizationLevel.h"
 #include "llvm/Support/ConvertUTF.h"
+#include "llvm/Support/Endian.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/SaveAndRestore.h"
@@ -64,6 +65,20 @@ static llvm::cl::opt<bool> ClSanitizeGuardChecks(
     "ubsan-guard-checks", llvm::cl::Optional,
     llvm::cl::desc("Guard UBSAN checks with `llvm.allow.ubsan.check()`."));
 
+//===--------------------------------------------------------------------===//
+//                        Defines for metadata
+//===--------------------------------------------------------------------===//
+enum VariableTypeDescriptorKind : uint16_t {
+  /// An integer type.
+  TK_Integer = 0x0000,
+  /// A floating-point type.
+  TK_Float = 0x0001,
+  /// An _BitInt(N) type.
+  TK_BitInt = 0x0002,
+  /// Any other type. The value representation is unspecified.
+  TK_Unknown = 0xffff
+};
+
 //===--------------------------------------------------------------------===//
 //                        Miscellaneous Helper Methods
 //===--------------------------------------------------------------------===//
@@ -3292,22 +3307,39 @@ LValue CodeGenFunction::EmitPredefinedLValue(const PredefinedExpr *E) {
 ///   { i16 TypeKind, i16 TypeInfo }
 /// \endcode
 ///
-/// followed by an array of i8 containing the type name. TypeKind is 0 for an
-/// integer, 1 for a floating point value, and -1 for anything else.
+/// followed by an array of i8 containing the type name with extra information
+/// for BitInt. TypeKind is 0 for an integer, 1 for a floating point value, 2
+/// for BitInt and -1 for anything else.
 llvm::Constant *CodeGenFunction::EmitCheckTypeDescriptor(QualType T) {
   // Only emit each type's descriptor once.
   if (llvm::Constant *C = CGM.getTypeDescriptorFromMap(T))
     return C;
 
-  uint16_t TypeKind = -1;
+  uint16_t TypeKind = TK_Unknown;
   uint16_t TypeInfo = 0;
+  bool IsBitInt = false;
 
   if (T->isIntegerType()) {
-    TypeKind = 0;
+    TypeKind = TK_Integer;
     TypeInfo = (llvm::Log2_32(getContext().getTypeSize(T)) << 1) |
                (T->isSignedIntegerType() ? 1 : 0);
+    // Follow suggestion from https://github.com/llvm/llvm-project/issues/64100
+    // So we can write the exact amount of bits in TypeName after '\0'
+    // making it <diagnostic-like type name>.'\0'.<32-bit width>.
+    if (T->isSignedIntegerType() && T->getAs<BitIntType>()) {
+      // Do a sanity checks as we are using 32-bit type to store bit length.
+      assert((getContext().getTypeSize(T) > 0) &&
+             " non positive amount of bits in __BitInt type");
+      assert((getContext().getTypeSize(T) <= 0xFFFFFFFF) &&
+             " too many bits in __BitInt type");
+
+      // Redefine TypeKind with the actual __BitInt type if we have signed
+      // BitInt.
+      TypeKind = TK_BitInt;
+      IsBitInt = true;
+    }
   } else if (T->isFloatingType()) {
-    TypeKind = 1;
+    TypeKind = TK_Float;
     TypeInfo = getContext().getTypeSize(T);
   }
 
@@ -3318,6 +3350,20 @@ llvm::Constant *CodeGenFunction::EmitCheckTypeDescriptor(QualType T) {
       DiagnosticsEngine::ak_qualtype, (intptr_t)T.getAsOpaquePtr(), StringRef(),
       StringRef(), std::nullopt, Buffer, std::nullopt);
 
+  if (IsBitInt) {
+    // The Structure is: 0 to end the string, 32 bit insigned integer in target
+    // endianness, zero.
+    char s[6] = {'\0', '\0', '\0', '\0', '\0', '\0'};
+    const auto *EIT = T->getAs<BitIntType>();
+    uint32_t Bits = EIT->getNumBits();
+    llvm::support::endian::write32(s + 1, Bits,
+                                   getTarget().isBigEndian()
+                                       ? llvm::endianness::big
+                                       : llvm::endianness::little);
+    StringRef str = StringRef(s, 6);
+    Buffer.append(str);
+  }
+
   llvm::Constant *Components[] = {
     Builder.getInt16(TypeKind), Builder.getInt16(TypeInfo),
     llvm::ConstantDataArray::getString(getLLVMContext(), Buffer)
diff --git a/compiler-rt/lib/ubsan/ubsan_value.cpp b/compiler-rt/lib/ubsan/ubsan_value.cpp
index dc61e5b939d95..6e88ebaf34d4b 100644
--- a/compiler-rt/lib/ubsan/ubsan_value.cpp
+++ b/compiler-rt/lib/ubsan/ubsan_value.cpp
@@ -67,18 +67,21 @@ const char *__ubsan::getObjCClassName(ValueHandle Pointer) {
 
 SIntMax Value::getSIntValue() const {
   CHECK(getType().isSignedIntegerTy());
+  // Val was zero-extended to ValueHandle. Sign-extend from original width
+  // to SIntMax.
+  const unsigned ExtraBits =
+      sizeof(SIntMax) * 8 - getType().getIntegerBitCount();
   if (isInlineInt()) {
-    // Val was zero-extended to ValueHandle. Sign-extend from original width
-    // to SIntMax.
-    const unsigned ExtraBits =
-      sizeof(SIntMax) * 8 - getType().getIntegerBitWidth();
     return SIntMax(UIntMax(Val) << ExtraBits) >> ExtraBits;
   }
-  if (getType().getIntegerBitWidth() == 64)
-    return *reinterpret_cast<s64*>(Val);
+  if (getType().getIntegerBitWidth() == 64) {
+    return SIntMax(UIntMax(*reinterpret_cast<s64 *>(Val)) << ExtraBits) >>
+           ExtraBits;
+  }
 #if HAVE_INT128_T
   if (getType().getIntegerBitWidth() == 128)
-    return *reinterpret_cast<s128*>(Val);
+    return SIntMax(UIntMax(*reinterpret_cast<s128 *>(Val)) << ExtraBits) >>
+           ExtraBits;
 #else
   if (getType().getIntegerBitWidth() == 128)
     UNREACHABLE("libclang_rt.ubsan was built without __int128 support");
diff --git a/compiler-rt/lib/ubsan/ubsan_value.h b/compiler-rt/lib/ubsan/ubsan_value.h
index e0957276dd241..acef362718204 100644
--- a/compiler-rt/lib/ubsan/ubsan_value.h
+++ b/compiler-rt/lib/ubsan/ubsan_value.h
@@ -103,6 +103,14 @@ class TypeDescriptor {
     /// representation is that of bitcasting the floating-point value to an
     /// integer type.
     TK_Float = 0x0001,
+    /// An _BitInt(N) type. Lowest bit is 1 for a signed value, 0 for an
+    /// unsigned
+    /// value. Remaining bits are log_2(bit_width). The value representation is
+    /// the integer itself if it fits into a ValueHandle, and a pointer to the
+    /// integer otherwise. TypeName contains the true width of the type for the
+    /// signed _BitInt(N) type stored after zero bit after TypeName as 32-bit
+    //  unsigned integer.
+    TK_BitInt = 0x0002,
     /// Any other type. The value representation is unspecified.
     TK_Unknown = 0xffff
   };
@@ -113,10 +121,15 @@ class TypeDescriptor {
     return static_cast<Kind>(TypeKind);
   }
 
-  bool isIntegerTy() const { return getKind() == TK_Integer; }
+  bool isIntegerTy() const {
+    return getKind() == TK_Integer || getKind() == TK_BitInt;
+  }
+  bool isBitIntTy() const { return getKind() == TK_BitInt; }
+
   bool isSignedIntegerTy() const {
     return isIntegerTy() && (TypeInfo & 1);
   }
+  bool isSignedBitIntTy() const { return isBitIntTy() && (TypeInfo & 1); }
   bool isUnsignedIntegerTy() const {
     return isIntegerTy() && !(TypeInfo & 1);
   }
@@ -125,6 +138,26 @@ class TypeDescriptor {
     return 1 << (TypeInfo >> 1);
   }
 
+  const char *getBitIntBitCountPointer() const {
+    CHECK(isBitIntTy());
+    CHECK(isSignedBitIntTy());
+    // Scan Name for zero and return the next address
+    const char *p = getTypeName();
+    while (*p != '\0') {
+      p++;
+    }
+    // Return the next address
+    return p + 1;
+  }
+
+  unsigned getIntegerBitCount() const {
+    CHECK(isIntegerTy());
+    if (isSignedBitIntTy())
+      return *reinterpret_cast<const u32 *>(getBitIntBitCountPointer());
+    else
+      return getIntegerBitWidth();
+  }
+
   bool isFloatTy() const { return getKind() == TK_Float; }
   unsigned getFloatBitWidth() const {
     CHECK(isFloatTy());
diff --git a/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c b/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c
new file mode 100644
index 0000000000000..9ec096a0cf20d
--- /dev/null
+++ b/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c
@@ -0,0 +1,188 @@
+// RUN: %clang -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -O0 -fsanitize=alignment,array-bounds,bool,float-cast-overflow,implicit-integer-sign-change,implicit-signed-integer-truncation,implicit-unsigned-integer-truncation,integer-divide-by-zero,nonnull-attribute,null,nullability-arg,nullability-assign,nullability-return,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,unsigned-integer-overflow,unsigned-shift-base,vla-bound %s -o %t1 && %run %t1 2>&1 | FileCheck %s --check-prefix=CHECK-R
+// RUN: %clang -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -fsanitize=array-bounds,enum,float-cast-overflow,integer-divide-by-zero,implicit-unsigned-integer-truncation,implicit-signed-integer-truncation,implicit-integer-sign-change,unsigned-integer-overflow,signed-integer-overflow,shift-base,shift-exponent -O0 -S -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-IR
+
+#include <stdint.h>
+#include <stdio.h>
+
+uint32_t float_divide_by_zero() {
+  float f = 1.0f / 0.0f;
+  // CHECK-IR: constant { i16, i16, [8 x i8] } { i16 1, i16 32, [8 x i8] c"'float'\00" }
+  _BitInt(37) r = (_BitInt(37))f;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:19: runtime error: inf is outside the range of representable values of type
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 13, [20 x i8] c"'_BitInt(37)'\00%\00\00\00\00\00" }
+  return r;
+}
+
+uint32_t integer_divide_by_zero() __attribute__((no_sanitize("memory"))) {
+  _BitInt(37) x = 1 / 0;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:21: runtime error: division by zero
+  // CHECK-IR: constant { i16, i16, [32 x i8] } { i16 0, i16 10, [32 x i8] c"'uint32_t' (aka 'unsigned int')\00" }
+  return x;
+}
+
+uint32_t implicit_unsigned_integer_truncation() {
+  unsigned _BitInt(37) x = 0U;
+  x += float_divide_by_zero();
+  x += integer_divide_by_zero();
+  x += ~0ULL;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:5: runtime error: unsigned integer overflow:
+  // CHECK-IR: constant { i16, i16, [23 x i8] } { i16 0, i16 12, [23 x i8] c"'unsigned _BitInt(37)'\00" }
+  uint32_t r = x;
+  return r;
+}
+
+uint32_t pointer_overflow() __attribute__((no_sanitize("address"))) {
+  _BitInt(37) *x = (_BitInt(37) *)1;
+  _BitInt(37) *y = x - 1;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:22: runtime error: pointer index expression with base
+  uint32_t r = *(_BitInt(37) *)&y;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:16: runtime error: implicit conversion from type
+  return r;
+}
+
+uint32_t vla_bound(_BitInt(37) x) {
+  _BitInt(37) a[x - 1];
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:17: runtime error: variable length array bound evaluates to non-positive value
+  return 0;
+}
+
+uint32_t nullability_arg(_BitInt(37) *_Nonnull x)
+    __attribute__((no_sanitize("address"))) {
+  _BitInt(37) y = *(_BitInt(37) *)&x;
+  return y;
+}
+
+uint32_t unsigned_shift_base() {
+  unsigned _BitInt(37) x = ~0U << 1;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:32: runtime error: left shift of 4294967295 by 1 places cannot be represented in type
+  return x;
+}
+
+uint32_t array_bounds() {
+  _BitInt(37) x[4];
+  _BitInt(37) y = x[10];
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:19: runtime error: index 10 out of bounds for type
+  // CHECK-IR: constant { i16, i16, [17 x i8] } { i16 -1, i16 0, [17 x i8] c"'_BitInt(37)[4]'\00" }
+  return (uint32_t)y;
+}
+
+uint32_t float_cast_overflow() {
+  float a = 100000000.0f;
+  _BitInt(7) b = (_BitInt(7))a;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:18: runtime error: 1e+08 is outside the range of representable values of type
+  // CHECK-IR: constant { i16, i16, [19 x i8] } { i16 2, i16 7, [19 x i8] c"'_BitInt(7)'\00\07\00\00\00\00\00" }
+  return b;
+}
+
+uint32_t implicit_integer_sign_change(unsigned _BitInt(37) x) {
+  _BitInt(37) r = x;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:19: runtime error: implicit conversion from type '{{[^']+}}' of value
+  return r & 0xFFFFFFFF;
+}
+
+_BitInt(13) implicit_signed_integer_truncation() {
+  _BitInt(73) x = (_BitInt(73)) ~((~0UL) >> 1);
+  return x;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:10: runtime error: implicit conversion from type
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 {{([[:xdigit:]]{2})}}, [20 x i8] c"'_BitInt(73)'\00I\00\00\00\00\00" }
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 9, [20 x i8] c"'_BitInt(13)'\00\0D\00\00\00\00\00" }
+}
+
+_BitInt(37) nonnull_attribute(__attribute__((nonnull)) _BitInt(37) * x)
+    __attribute__((no_sanitize("address"))) {
+  return *(_BitInt(37) *)&x;
+}
+
+uint32_t nullability_assign(_BitInt(37) * x)
+    __attribute__((no_sanitize("address"))) {
+  _BitInt(37) *_Nonnull y = x;
+  uint32_t r = *(_BitInt(37) *)&y;
+  return r;
+}
+
+_BitInt(37) shift_exponent() {
+  _BitInt(37) x = 1 << (-1);
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:21: runtime error: shift exponent -1 is negative
+  return x;
+}
+
+_BitInt(37) shift_base() {
+  _BitInt(37) x = (-1) << 1;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:24: runtime error: left shift of negative value -1
+  return x;
+}
+
+uint32_t negative_shift1(unsigned _BitInt(37) x) {
+  _BitInt(9) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [19 x i8] } { i16 2, i16 9, [19 x i8] c"'_BitInt(9)'\00\09\00\00\00\00\00" }
+}
+
+uint32_t negative_shift2(unsigned _BitInt(37) x) {
+  _BitInt(17) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 11, [20 x i8] c"'_BitInt(17)'\00\11\00\00\00\00\00" }
+}
+
+uint32_t negative_shift3(unsigned _BitInt(37) x) {
+  _BitInt(34) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 13, [20 x i8] c"'_BitInt(34)'\00\22\00\00\00\00\00" }
+}
+
+uint32_t negative_shift4(unsigned _BitInt(37) x) {
+  int64_t c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+}
+
+uint32_t negative_shift5(unsigned _BitInt(37) x) {
+  _BitInt(68) c = -2;
+  return x >> c;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:12: runtime error: shift exponent -2 is negative
+  // CHECK-IR: constant { i16, i16, [20 x i8] } { i16 2, i16 {{([[:xdigit:]]{2})}}, [20 x i8] c"'_BitInt(68)'\00D\00\00\00\00\00" }
+}
+
+uint32_t unsigned_integer_overflow() {
+  unsigned _BitInt(37) x = ~0U;
+  x++;
+  return x;
+  // CHECK-R: {{.*}}bit-int.c:[[@LINE-1]]:10: runtime error: implicit conversion from type
+}
+
+uint32_t signed_integer_overflow() {
+  _BitInt(37) x = (_BitInt(37)) ~((~0U) >> 1);
+  x--;
+  return x;
+}
+
+int main(int argc, char **argv) {
+  // clang-format off
+  uint64_t result =
+      1ULL +
+      implicit_unsigned_integer_truncation() +
+      pointer_overflow() +
+      vla_bound(argc) +
+      nullability_arg((_BitInt(37) *)argc) +
+      unsigned_shift_base() +
+      (uint32_t)array_bounds() +
+      float_cast_overflow() +
+      implicit_integer_sign_change((unsigned _BitInt(37))(argc - 2)) +
+      (uint64_t)implicit_signed_integer_truncation() +
+      ((uint64_t)nonnull_attribute((_BitInt(37) *)argc) & 0xFFFFFFFF) +
+      nullability_assign((_BitInt(37) *)argc) +
+      shift_exponent() +
+      (uint32_t)shift_base() +
+      negative_shift1(5) +
+      negative_shift2(5) +
+      negative_shift3(5) +
+      negative_shift4(5) +
+      negative_shift5(5) +
+      unsigned_integer_overflow() +
+      signed_integer_overflow();
+  // clang-format on
+  printf("%u\n", (uint32_t)(result & 0xFFFFFFFF));
+}

@earnol earnol linked an issue May 28, 2024 that may be closed by this pull request
@earnol earnol marked this pull request as draft May 29, 2024 15:13
@earnol earnol force-pushed the main branch 6 times, most recently from a128cd4 to 27a7138 Compare May 31, 2024 02:07
@earnol earnol marked this pull request as ready for review May 31, 2024 12:47
@earnol earnol linked an issue May 31, 2024 that may be closed by this pull request
Copy link
Contributor

@ilovepi ilovepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo some nits, this seems fine to me. I'll leave approval to one of the code owners, though, as I've had little to do w/ big integer code or UBSAN.

Comment on lines 49 to 54
uint32_t nullability_arg(_BitInt(37) *_Nonnull x)
__attribute__((no_sanitize("address"))) {
_BitInt(37) y = *(_BitInt(37) *)&x;
return y;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of these functions w/o checks, can you add a comment as to what they're for?

I'm also pretty confused here as to what's going on in this particular function. You're taking the address of the parameter (which seems to be argc as pointer), casting it to a _BitInt*, dereferencing it to store into a local variable and return that... I don't see what's being exercised here, nor why we'd need such contortions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short explanation:
Ubsan should not be triggered by assignment even if data is invalid.
Longer explanation:
This code is clearly invalid, yet i expect no diagnostics here as ubsan sanitizer should not be be triggered on a simple assignment according to the current implementation.
As you can see, when asan is enabled this function throw a diagnostics (as it should). this is the reason why it is disabled.
I hope this explanation is clear.

As for comments addition: accepted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, does no CHECK: line imply that there isn't a diagnostic? I think it will just silently pass... maybe if these are in a different file where there are no diagnostics expected it would work as intended?

I always struggle with checks for output that shouldn't happen, especially since CHECK-NOT often keeps passing when it isn't intended to. I know that in lld we can use --fatal-warnings for tests that should not produce a diagnostic. I'm not sure how that's normally done for the sanitizers though. Maybe just a second test file that checks that there are no runtime error: lines? That could be brittle though if the diagnostic gets spelled differently, or if there's a typo in the check. Maybe its easy if UBSAN errors change the return code, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, does no CHECK: line imply that there isn't a diagnostic? I think it will just silently pass... maybe if these are in a different file where there are no diagnostics expected it would work as intended?

Yes. It will silently pass, yet unexpected diagnostics in the middle of the check script can throw the FileCheck off and it will barf. My idea is having at least some check (which can provide false negative error detection: detects not error when there is an error) is better compared to no check at all completely.

CHECK-NOT will not work here as it will require the line which should not be encountered and line is not known in this case.
CHECK-EMPTY also does not look like a right choice.

That could be brittle though if the diagnostic gets spelled differently, or if there's a typo in the check. Maybe its easy if UBSAN errors change the return code, though?

You have given me a great idea. These examples can be moved to different test file which will be compiled with -fsanitize-trap option. It this case it would be easy to detect the ubsan activation did not happened. On the other hand it will require extra file and i wanted to pack all ubsan _BitInt tests into single file. But probably it is not that bad, if it is the price for check precision.
Alternatively FileCheck can be used with with "--implicit-check-not error" option to verify the fact runtime error was not thrown.
What approach do you think will be the best?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that there was a question at the end. I guess using -fsanitize-trap seems fine. I'm hardly an expert on compiler-rt conventions though, so take that w/ a grain of salt. The other reviewers probably have a more cogent idea about what's appropriate, but I think having the second test where they're expected to pass makes a lot more sense than trying to exclude them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated commit using CHECK-NOT with "runtime error". It looks like a reasonable check for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... Just seen your response. Do you want me to change it into "-fsanitize-trap" or the current implementation is fine?
For me whatever diagnostics is displayed is having "runtime error" so while line is not known in advance we are pretty sure about part of the line and it can be entered into CHECK-NOT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, but maybe check the other UBSAN tests, and follow their conventions for tests that should run w/o an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically what i've done when i looked into ./compiler-rt/test/ubsan/TestCases/Misc/deduplication.cpp it has multiple CHECK-NOTs. But on the second though it looks like also unnecessary.

@earnol earnol force-pushed the main branch 2 times, most recently from 3713f9a to 5e2f1af Compare June 1, 2024 00:12
@earnol earnol requested review from rjmccall and ilovepi June 1, 2024 04:27
@earnol earnol force-pushed the main branch 2 times, most recently from 4698818 to 9607d2f Compare June 10, 2024 20:42
@earnol
Copy link
Contributor Author

earnol commented Jun 17, 2024

Ping!
Any additional suggestions for the improvements?
Selfishly want to move it forward :)

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem pretty reasonable to me (aside from a few small nits), but I'd appreciate someone with more compiler-rt knowledge to also sign off on the changes.

Without this patch compiler-rt ubsan library has a bug displaying
incorrect values for variables of the _BitInt (previousely called
_ExtInt) type. This patch affects both: generation of metadata
inside code generator and runtime part. The runtime part provided only
for i386 and x86_64 runtimes. Other runtimes should be updated to take
full benefit of this patch.
The patch is constructed the way to be backward compatible and
int and float type runtime diagnostics should be unaffected for not
yet updated runtimes.

Co-authored-by: Aaron Ballman <[email protected]>
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@earnol earnol merged commit 49001d5 into llvm:main Jun 20, 2024
7 checks passed
Copy link

@earnol Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@fmayer
Copy link
Contributor

fmayer commented Jun 20, 2024

Breakage looks related: https://lab.llvm.org/buildbot/#/builders/72/builds/265

FAIL: UBSan-MemorySanitizer-powerpc64le :: TestCases/Integer/bit-int.c (4716 of 4745)
******************** TEST 'UBSan-MemorySanitizer-powerpc64le :: TestCases/Integer/bit-int.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_gcc/./bin/clang  -fsanitize=memory  -m64 -fno-function-sections  -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -O0 -fsanitize=alignment,array-bounds,bool,float-cast-overflow,implicit-integer-sign-change,implicit-signed-integer-truncation,implicit-unsigned-integer-truncation,integer-divide-by-zero,nonnull-attribute,null,nullability-arg,nullability-assign,nullability-return,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,unsigned-integer-overflow,unsigned-shift-base,vla-bound /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_g
 cc/runtimes/runtimes-bins/compiler-rt/test/ubsan/MemorySanitizer-powerpc64le/TestCases/Integer/Output/bit-int.c.tmp1 &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_gcc/runtimes/runtimes-bins/compiler-rt/test/ubsan/MemorySanitizer-powerpc64le/TestCases/Integer/Output/bit-int.c.tmp1 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c --check-prefix=RUNTIME
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_gcc/./bin/clang -fsanitize=memory -m64 -fno-function-sections -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -O0 -fsanitize=alignment,array-bounds,bool,float-cast-overflow,implicit-integer-sign-change,implicit-signed-integer-truncation,implicit-unsigned-integer-truncation,integer-divide-by-zero,nonnull-attribute,null,nullability-arg,nullability-assign,nullability-return,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,unsigned-integer-overflow,unsigned-shift-base,vla-bound /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_gcc/runtimes/runti
 mes-bins/compiler-rt/test/ubsan/MemorySanitizer-powerpc64le/TestCases/Integer/Output/bit-int.c.tmp1
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_gcc/runtimes/runtimes-bins/compiler-rt/test/ubsan/MemorySanitizer-powerpc64le/TestCases/Integer/Output/bit-int.c.tmp1
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c --check-prefix=RUNTIME
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:93:14: error: RUNTIME: expected string not found in input
 // RUNTIME: {{.*}}bit-int.c:[[@LINE-1]]:24: runtime error: left shift of negative value -1
             ^
<stdin>:25:206: note: scanning from here
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:86:21: runtime error: shift exponent -1 is negative
                                                                                                                                                                                                             ^
<stdin>:25:206: note: with "@LINE-1" equal to "92"
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:86:21: runtime error: shift exponent -1 is negative
                                                                                                                                                                                                             ^
<stdin>:33:252: note: possible intended match here
SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_gcc/runtimes/runtimes-bins/compiler-rt/test/ubsan/MemorySanitizer-powerpc64le/TestCases/Integer/Output/bit-int.c.tmp1+0xe6d64) in shift_exponent
                                                                                                                                                                                                                                                           ^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           20: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:65:18 
           21: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:72:19: runtime error: implicit conversion from type 'unsigned _BitInt(37)' of value 137438953471 (64-bit, unsigned) to type '_BitInt(37)' changed the value to -1 (64-bit, signed)
           22: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:72:19 
           23: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:79:10: runtime error: implicit conversion from type '_BitInt(73)' of value 0x00000000000000008000000000000000 (128-bit, signed) to type '_BitInt(13)' changed the value to 0 (16-bit, signed)
           24: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:79:10 
           25: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/ubsan/TestCases/Integer/bit-int.c:86:21: runtime error: shift exponent -1 is negative
Step 9 (test compiler-rt gcc) failure: test compiler-rt gcc (failure)

@earnol
Copy link
Contributor Author

earnol commented Jun 20, 2024

Yes. It is indeed the cause. Already created a code to address the the issue.

#96240

Revert of this commit is:
#96239

earnol added a commit that referenced this pull request Jun 20, 2024
earnol added a commit that referenced this pull request Jun 21, 2024
@mstorsjo
Copy link
Member

In addition to the issue noted on buildbots, this also caused failed tests on i386:
https://github.com/mstorsjo/llvm-mingw/actions/runs/9606458336/job/26504129718#step:8:1501

There seem to be a couple of different errors there:

bit-int.c:72:19: runtime error: implicit conversion from type 'unsigned _BitInt(37)' of value [21707795506135039](tel:21707795506135039) (64-bit, unsigned) to type '_BitInt(37)' changed the value to -1 (64-bit, signed)

AddressSanitizer: CHECK failed: ubsan_value.cpp:87 "((0 && "libclang_rt.ubsan was built without __int128 support")) != (0)" (0x0, 0x0) (tid=2296)

@earnol
Copy link
Contributor Author

earnol commented Jun 25, 2024

@mstorsjo It seems your compiler build does not have int128_t enabled. Could you please test #96240 in your environment to verify it has no such problem?

@mstorsjo
Copy link
Member

@mstorsjo It seems your compiler build does not have int128_t enabled.

Indeed, Clang doesn't provide int128_t for 32 bit targets, AFAIK.

Could you please test #96240 in your environment to verify it has no such problem?

That PR does seem to work fine, in such an environment, in a quick adhoc test setup - thanks!

@earnol
Copy link
Contributor Author

earnol commented Jun 26, 2024

Indeed, Clang doesn't provide int128_t for 32 bit targets, AFAIK.

It looks like a little bit more complex.
I checked with Standalone-i386 and AddressSanitizer-i386 targets. They do have int128_t for 32 bit targets as soon as clang itself built as 64-bit binary.

Could you please test #96240 in your environment to verify it has no such problem?

That PR does seem to work fine, in such an environment, in a quick adhoc test setup - thanks!

Excellent. Waiting for sign-off from maintainers in this case.

@mstorsjo
Copy link
Member

Indeed, Clang doesn't provide int128_t for 32 bit targets, AFAIK.

It looks like a little bit more complex. I checked with Standalone-i386 and AddressSanitizer-i386 targets. They do have int128_t for 32 bit targets as soon as clang itself built as 64-bit binary.

That doesn't quite reflect my experience - the bitness of the host compiler binary should not affect what features are available in the target environment:

$ file -L bin/clang
bin/clang: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[xxHash]=724b58c4c8ae1673, not stripped
$ bin/clang -target i386-linux-gnu -E -dM - < /dev/null | grep INT128
$ bin/clang -target x86_64-linux-gnu -E -dM - < /dev/null | grep INT128
#define __SIZEOF_INT128__ 16
$ bin/clang -target i386-windows -E -dM - < /dev/null | grep INT128
$ bin/clang -target x86_64-windows -E -dM - < /dev/null | grep INT128
#define __SIZEOF_INT128__ 16
$ cat int128.c 
__int128_t a = 42;
$ bin/clang -target x86_64-linux-gnu -c int128.c
$ bin/clang -target i386-linux-gnu -c int128.c
int128.c:1:1: error: unknown type name '__int128_t'
    1 | __int128_t a = 42;
      | ^
1 error generated.

I'm not quite sure what kind of test setup you have - it sounds like it has detected feature based on the compiler's default target, regardless of what the target really supports?

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…3612)

Without this patch compiler-rt ubsan library has a bug displaying
incorrect values for variables of the _BitInt (previously called
_ExtInt) type. This patch affects affects both: generation of metadata
inside code generator and runtime part. The runtime part provided only
for i386 and x86_64 runtimes. Other runtimes should be updated to take
full benefit of this patch.
The patch is constructed the way to be backward compatible and int and
float type runtime diagnostics should be unaffected for not yet updated
runtimes.

This patch fixes issue:
llvm#64100.

Co-authored-by: Vladislav Aranov <[email protected]>
Co-authored-by: Aaron Ballman <[email protected]>
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt:ubsan Undefined behavior sanitizer compiler-rt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UBSan output is incorrect for non power-of-2 sized _BitInt types ubsan vs. _BitInt
7 participants