Skip to content

Reland "[clang] Add nuw attribute to GEPs (#105496)" #107257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

hazzlim
Copy link
Contributor

@hazzlim hazzlim commented Sep 4, 2024

Add nuw attribute to inbounds GEPs where the expression used to form the GEP is an addition of unsigned indices.

Relands #105496, which was reverted because it exposed a miscompilation arising from #98608. This is now fixed by #106512.

Add nuw attribute to inbounds GEPs where the expression used to form the
GEP is an addition of unsigned indices.

Relands llvm#105496, which was reverted because it exposed a miscompilation
arising from llvm#98608. This is now fixed by llvm#106512.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:PowerPC clang:codegen IR generation bugs: mangling, exceptions, etc. HLSL HLSL Language Support clang:openmp OpenMP related changes to Clang labels Sep 4, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-hlsl
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Hari Limaye (hazzlim)

Changes

Add nuw attribute to inbounds GEPs where the expression used to form the GEP is an addition of unsigned indices.

Relands #105496, which was reverted because it exposed a miscompilation arising from #98608. This is now fixed by #106512.


Patch is 582.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107257.diff

91 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuilder.h (+4-2)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+14-3)
  • (modified) clang/test/CodeGen/2005-01-02-ConstantInits.c (+2-2)
  • (modified) clang/test/CodeGen/PowerPC/ppc-emmintrin.c (+6-6)
  • (modified) clang/test/CodeGen/PowerPC/ppc-xmmintrin.c (+8-8)
  • (modified) clang/test/CodeGen/attr-counted-by.c (+8-8)
  • (modified) clang/test/CodeGen/catch-nullptr-and-nonzero-offset-when-nullptr-is-defined.c (+1-1)
  • (modified) clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c (+32-32)
  • (modified) clang/test/CodeGen/catch-pointer-overflow-volatile.c (+1-1)
  • (modified) clang/test/CodeGen/catch-pointer-overflow.c (+3-3)
  • (modified) clang/test/CodeGen/ext-int.c (+1-1)
  • (modified) clang/test/CodeGen/hexagon-brev-ld-ptr-incdec.c (+3-3)
  • (modified) clang/test/CodeGen/integer-overflow.c (+3-3)
  • (modified) clang/test/CodeGen/ms-intrinsics.c (+6-6)
  • (modified) clang/test/CodeGen/ubsan-pointer-overflow.m (+1-1)
  • (modified) clang/test/CodeGen/vla.c (+1-1)
  • (modified) clang/test/CodeGenCXX/attr-likelihood-iteration-stmt.cpp (+18-18)
  • (modified) clang/test/CodeGenCXX/for-range.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/pr45964-decomp-transform.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/vla.cpp (+2-2)
  • (modified) clang/test/CodeGenHLSL/buffer-array-operator.hlsl (+2-2)
  • (modified) clang/test/CodeGenSYCL/address-space-deduction.cpp (+24-24)
  • (modified) clang/test/Headers/__clang_hip_math.hip (+35-35)
  • (modified) clang/test/OpenMP/bug60602.cpp (+4-4)
  • (modified) clang/test/OpenMP/declare_mapper_codegen.cpp (+2-2)
  • (modified) clang/test/OpenMP/distribute_codegen.cpp (+16-16)
  • (modified) clang/test/OpenMP/distribute_parallel_for_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/distribute_simd_codegen.cpp (+48-48)
  • (modified) clang/test/OpenMP/for_linear_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/for_reduction_codegen.cpp (+32-32)
  • (modified) clang/test/OpenMP/for_reduction_codegen_UDR.cpp (+16-16)
  • (modified) clang/test/OpenMP/for_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/for_scan_codegen.cpp (+20-20)
  • (modified) clang/test/OpenMP/for_simd_scan_codegen.cpp (+20-20)
  • (modified) clang/test/OpenMP/irbuilder_for_iterator.cpp (+3-3)
  • (modified) clang/test/OpenMP/irbuilder_for_rangefor.cpp (+3-3)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned.c (+4-4)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned_auto.c (+4-4)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned_down.c (+1-1)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned_dynamic.c (+4-4)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned_dynamic_chunked.c (+4-4)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned_runtime.c (+4-4)
  • (modified) clang/test/OpenMP/irbuilder_for_unsigned_static_chunked.c (+4-4)
  • (modified) clang/test/OpenMP/map_struct_ordering.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_in_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/master_taskloop_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/master_taskloop_simd_in_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/master_taskloop_simd_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/ordered_codegen.cpp (+40-40)
  • (modified) clang/test/OpenMP/parallel_for_codegen.cpp (+72-72)
  • (modified) clang/test/OpenMP/parallel_for_linear_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/parallel_for_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/parallel_for_scan_codegen.cpp (+22-22)
  • (modified) clang/test/OpenMP/parallel_for_simd_scan_codegen.cpp (+20-20)
  • (modified) clang/test/OpenMP/parallel_master_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/parallel_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/parallel_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/parallel_sections_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/reduction_implicit_map.cpp (+25-25)
  • (modified) clang/test/OpenMP/sections_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/target_data_use_device_addr_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/target_data_use_device_ptr_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/target_has_device_addr_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/target_in_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/target_is_device_ptr_codegen.cpp (+104-104)
  • (modified) clang/test/OpenMP/target_map_both_pointer_pointee_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/target_map_codegen_01.cpp (+1-1)
  • (modified) clang/test/OpenMP/target_map_codegen_21.cpp (+3-3)
  • (modified) clang/test/OpenMP/target_map_codegen_27.cpp (+1-1)
  • (modified) clang/test/OpenMP/target_map_codegen_28.cpp (+3-3)
  • (modified) clang/test/OpenMP/target_map_codegen_29.cpp (+2-2)
  • (modified) clang/test/OpenMP/target_map_deref_array_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/target_map_member_expr_array_section_codegen.cpp (+4-4)
  • (modified) clang/test/OpenMP/target_map_member_expr_codegen.cpp (+3-3)
  • (modified) clang/test/OpenMP/target_map_nest_defalut_mapper_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/target_parallel_for_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/target_parallel_reduction_task_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/target_task_affinity_codegen.cpp (+12-12)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_reduction_task_codegen.cpp (+22-22)
  • (modified) clang/test/OpenMP/target_update_codegen.cpp (+17-17)
  • (modified) clang/test/OpenMP/task_codegen.c (+1-1)
  • (modified) clang/test/OpenMP/task_codegen.cpp (+124-124)
  • (modified) clang/test/OpenMP/task_in_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/taskgroup_task_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/taskloop_in_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/taskloop_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/taskloop_simd_in_reduction_codegen.cpp (+5-5)
  • (modified) clang/test/OpenMP/taskloop_simd_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_reduction_task_codegen.cpp (+22-22)
diff --git a/clang/lib/CodeGen/CGBuilder.h b/clang/lib/CodeGen/CGBuilder.h
index 08730a6a6672a1..b8036cf6e6a306 100644
--- a/clang/lib/CodeGen/CGBuilder.h
+++ b/clang/lib/CodeGen/CGBuilder.h
@@ -14,6 +14,7 @@
 #include "CodeGenTypeCache.h"
 #include "llvm/Analysis/Utils/Local.h"
 #include "llvm/IR/DataLayout.h"
+#include "llvm/IR/GEPNoWrapFlags.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Type.h"
 
@@ -334,9 +335,10 @@ class CGBuilderTy : public CGBuilderBaseTy {
 
   Address CreateGEP(Address Addr, ArrayRef<llvm::Value *> IdxList,
                     llvm::Type *ElementType, CharUnits Align,
-                    const Twine &Name = "") {
+                    const Twine &Name = "",
+                    llvm::GEPNoWrapFlags NW = llvm::GEPNoWrapFlags::none()) {
     llvm::Value *Ptr = emitRawPointerFromAddress(Addr);
-    return RawAddress(CreateGEP(Addr.getElementType(), Ptr, IdxList, Name),
+    return RawAddress(CreateGEP(Addr.getElementType(), Ptr, IdxList, Name, NW),
                       ElementType, Align);
   }
 
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 7aa2d3d89c2936..88fbbe6c4c965f 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -36,6 +36,7 @@
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/FixedPointBuilder.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/GEPNoWrapFlags.h"
 #include "llvm/IR/GetElementPtrTypeIterator.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Intrinsics.h"
@@ -5759,7 +5760,12 @@ CodeGenFunction::EmitCheckedInBoundsGEP(llvm::Type *ElemTy, Value *Ptr,
                                         bool SignedIndices, bool IsSubtraction,
                                         SourceLocation Loc, const Twine &Name) {
   llvm::Type *PtrTy = Ptr->getType();
-  Value *GEPVal = Builder.CreateInBoundsGEP(ElemTy, Ptr, IdxList, Name);
+
+  llvm::GEPNoWrapFlags NWFlags = llvm::GEPNoWrapFlags::inBounds();
+  if (!SignedIndices && !IsSubtraction)
+    NWFlags |= llvm::GEPNoWrapFlags::noUnsignedWrap();
+
+  Value *GEPVal = Builder.CreateGEP(ElemTy, Ptr, IdxList, Name, NWFlags);
 
   // If the pointer overflow sanitizer isn't enabled, do nothing.
   if (!SanOpts.has(SanitizerKind::PointerOverflow))
@@ -5874,8 +5880,13 @@ Address CodeGenFunction::EmitCheckedInBoundsGEP(
     Address Addr, ArrayRef<Value *> IdxList, llvm::Type *elementType,
     bool SignedIndices, bool IsSubtraction, SourceLocation Loc, CharUnits Align,
     const Twine &Name) {
-  if (!SanOpts.has(SanitizerKind::PointerOverflow))
-    return Builder.CreateInBoundsGEP(Addr, IdxList, elementType, Align, Name);
+  if (!SanOpts.has(SanitizerKind::PointerOverflow)) {
+    llvm::GEPNoWrapFlags NWFlags = llvm::GEPNoWrapFlags::inBounds();
+    if (!SignedIndices && !IsSubtraction)
+      NWFlags |= llvm::GEPNoWrapFlags::noUnsignedWrap();
+
+    return Builder.CreateGEP(Addr, IdxList, elementType, Align, Name, NWFlags);
+  }
 
   return RawAddress(
       EmitCheckedInBoundsGEP(Addr.getElementType(), Addr.emitRawPointer(*this),
diff --git a/clang/test/CodeGen/2005-01-02-ConstantInits.c b/clang/test/CodeGen/2005-01-02-ConstantInits.c
index 7772a64331ffb7..d90c2ea42da613 100644
--- a/clang/test/CodeGen/2005-01-02-ConstantInits.c
+++ b/clang/test/CodeGen/2005-01-02-ConstantInits.c
@@ -1,4 +1,4 @@
-// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --global-value-regex "@.+"
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-globals --global-value-regex "[A-Za-z].*"
 // RUN: %clang_cc1 -triple=x86_64-unknown-linux %s -emit-llvm -o - | FileCheck %s
 
 // This tests all kinds of hard cases with initializers and
@@ -51,7 +51,7 @@ int foo(int i) { return bar(&Arr[49])+bar(&Arr[i]); }
 // CHECK-NEXT:    store i32 [[I]], ptr [[I_ADDR]], align 4
 // CHECK-NEXT:    store ptr @Arr, ptr [[P]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P]], align 8
-// CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 1
+// CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds nuw i32, ptr [[TMP0]], i32 1
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[P]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[I_ADDR]], align 4
 // CHECK-NEXT:    [[IDX_EXT:%.*]] = sext i32 [[TMP1]] to i64
diff --git a/clang/test/CodeGen/PowerPC/ppc-emmintrin.c b/clang/test/CodeGen/PowerPC/ppc-emmintrin.c
index a3650beec625f2..4c4d0dfce05eaf 100644
--- a/clang/test/CodeGen/PowerPC/ppc-emmintrin.c
+++ b/clang/test/CodeGen/PowerPC/ppc-emmintrin.c
@@ -1012,14 +1012,14 @@ test_shuffle() {
 // CHECK: %[[SHR:[0-9a-zA-Z_.]+]] = ashr i32 %{{[0-9a-zA-Z_.]+}}, 6
 // CHECK: %[[AND4:[0-9a-zA-Z_.]+]] = and i32 %[[SHR]], 3
 // CHECK: sext i32 %[[AND4]] to i64
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %{{[0-9a-zA-Z_.]+}}, i32 0
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %{{[0-9a-zA-Z_.]+}}, i32 1
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK: %[[ADD:[0-9a-zA-Z_.]+]] = add i32 %{{[0-9a-zA-Z_.]+}}, 269488144
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %[[ADD]], i32 2
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_epi32.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK: add i32 %{{[0-9a-zA-Z_.]+}}, 269488144
 // CHECK: call <4 x i32> @vec_perm(int vector[4], int vector[4], unsigned char vector[16])
 
@@ -1050,7 +1050,7 @@ test_shuffle() {
 // CHECK: sext i32 %[[AND4]] to i64
 // CHECK-LE: store <2 x i64> <i64 1663540288323457296, i64 0>, ptr %{{[0-9a-zA-Z_.]+}}, align 16
 // CHECK-BE: store <2 x i64> <i64 1157726452361532951, i64 0>, ptr %{{[0-9a-zA-Z_.]+}}, align 16
-// CHECK-COUNT-4: getelementptr inbounds [4 x i16], ptr @_mm_shufflehi_epi16.__permute_selectors, i64 0, i64 {{[0-9a-zA-Z_%.]+}}
+// CHECK-COUNT-4: getelementptr inbounds nuw [4 x i16], ptr @_mm_shufflehi_epi16.__permute_selectors, i64 0, i64 {{[0-9a-zA-Z_%.]+}}
 // CHECK: call <2 x i64> @vec_perm(unsigned long long vector[2], unsigned long long vector[2], unsigned char vector[16])
 
 // CHECK-LABEL: define available_externally <2 x i64> @_mm_shufflelo_epi16
@@ -1067,7 +1067,7 @@ test_shuffle() {
 // CHECK: sext i32 %[[AND4]] to i64
 // CHECK-LE: store <2 x i64> <i64 0, i64 2242261671028070680>, ptr %{{[0-9a-zA-Z_.]+}}, align 16
 // CHECK-BE: store <2 x i64> <i64 0, i64 1736447835066146335>, ptr %{{[0-9a-zA-Z_.]+}}, align 16
-// CHECK-COUNT-4: getelementptr inbounds [4 x i16], ptr @_mm_shufflelo_epi16.__permute_selectors, i64 0, i64 {{[0-9a-zA-Z_%.]+}}
+// CHECK-COUNT-4: getelementptr inbounds nuw [4 x i16], ptr @_mm_shufflelo_epi16.__permute_selectors, i64 0, i64 {{[0-9a-zA-Z_%.]+}}
 // CHECK: call <2 x i64> @vec_perm(unsigned long long vector[2], unsigned long long vector[2], unsigned char vector[16])
 
 void __attribute__((noinline))
diff --git a/clang/test/CodeGen/PowerPC/ppc-xmmintrin.c b/clang/test/CodeGen/PowerPC/ppc-xmmintrin.c
index 95dfd1202f1575..4a15fa9f76ceea 100644
--- a/clang/test/CodeGen/PowerPC/ppc-xmmintrin.c
+++ b/clang/test/CodeGen/PowerPC/ppc-xmmintrin.c
@@ -894,16 +894,16 @@ test_shuffle() {
 // CHECK: %[[SHR3:[0-9a-zA-Z_.]+]] = ashr i32 %{{[0-9a-zA-Z_.]+}}, 6
 // CHECK: %[[AND4:[0-9a-zA-Z_.]+]] = and i32 %[[SHR3]], 3
 // CHECK: sext i32 %[[AND4]] to i64
-// CHECK: getelementptr inbounds [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK-LE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 0
 // CHECK-BE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 3
-// CHECK: getelementptr inbounds [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK-LE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 1
 // CHECK-BE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 2
-// CHECK: getelementptr inbounds [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK-LE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 2
 // CHECK-BE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 1
-// CHECK: getelementptr inbounds [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
+// CHECK: getelementptr inbounds nuw [4 x i16], ptr @_mm_shuffle_pi16.__permute_selectors, i64 0, i64 %{{[0-9a-zA-Z_.]+}}
 // CHECK-LE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 3
 // CHECK-BE: getelementptr inbounds [4 x i16], ptr %{{[0-9a-zA-Z_.]+}}, i64 0, i64 0
 // CHECK: call <2 x i64> @vec_splats(unsigned long long)
@@ -923,14 +923,14 @@ test_shuffle() {
 // CHECK: %[[SHR3:[0-9a-zA-Z_.]+]] = ashr i32 %{{[0-9a-zA-Z_.]+}}, 6
 // CHECK: %[[AND4:[0-9a-zA-Z_.]+]] = and i32 %[[SHR3]], 3
 // CHECK: sext i32 %[[AND4]] to i64
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %{{[0-9a-zA-Z_.]+}}, i32 0
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %{{[0-9a-zA-Z_.]+}}, i32 1
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
 // CHECK: %[[ADD:[0-9a-zA-Z_.]+]] = add i32 %{{[0-9a-zA-Z_.]+}}, 269488144
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %[[ADD]], i32 2
-// CHECK: getelementptr inbounds [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
+// CHECK: getelementptr inbounds nuw [4 x i32], ptr @_mm_shuffle_ps.__permute_selectors, i64 0, i64
 // CHECK: %[[ADD2:[0-9a-zA-Z_.]+]] = add i32 %{{[0-9a-zA-Z_.]+}}, 269488144
 // CHECK: insertelement <4 x i32> %{{[0-9a-zA-Z_.]+}}, i32 %[[ADD2]], i32 3
 // CHECK: call <4 x float> @vec_perm(float vector[4], float vector[4], unsigned char vector[16])
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index ab36b6e7720ba3..a06e815737f4e6 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -118,7 +118,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = shl i32 [[DOT_COUNTED_BY_LOAD]], 2
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTINV:%.*]] = icmp slt i32 [[DOT_COUNTED_BY_LOAD]], 0
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = select i1 [[DOTINV]], i32 0, i32 [[TMP2]]
@@ -134,7 +134,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[DOTINV:%.*]] = icmp slt i32 [[DOT_COUNTED_BY_LOAD]], 0
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = select i1 [[DOTINV]], i32 0, i32 [[TMP0]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -142,7 +142,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -150,7 +150,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -207,7 +207,7 @@ size_t test2_bdos(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = sext i32 [[DOT_COUNTED_BY_LOAD]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP3:%.*]] = shl nsw i64 [[TMP2]], 2
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP4:%.*]] = tail call i64 @llvm.smax.i64(i64 [[TMP3]], i64 4)
@@ -231,7 +231,7 @@ size_t test2_bdos(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[DOTINV:%.*]] = icmp slt i32 [[DOT_COUNTED_BY_LOAD]], 0
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = select i1 [[DOTINV]], i32 0, i32 [[TMP4]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -239,7 +239,7 @@ size_t test2_bdos(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -247,7 +247,7 @@ size_t test2_bdos(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
diff --git a/clang/test/CodeGen/catch-nullptr-and-nonzero-offset-when-nullptr-is-defined.c b/clang/test/CodeGen/catch-nullptr-and-nonzero-offset-when-nullptr-is-defined.c
index 39ede01d6e3b83..8a560a47ad1e10 100644
--- a/clang/test/CodeGen/catch-nullptr-and-nonzero-offset-when-nullptr-is-defined.c
+++ b/clang/test/CodeGen/catch-nullptr-and-nonzero-offset-when-nullptr-is-defined.c
@@ -33,7 +33,7 @@ char *add_unsigned(char *base, unsigned long offset) {
   // CHECK-NEXT:                        store i64 %[[OFFSET]], ptr %[[OFFSET_ADDR]], align 8
   // CHECK-NEXT:                        %[[BASE_RELOADED:.*]] = load ptr, ptr %[[BASE_ADDR]], align 8
   // CHECK-NEXT:                        %[[OFFSET_RELOADED:.*]] = load i64, ptr %[[OFFSET_ADDR]], align 8
-  // CHECK-NEXT:                        %[[ADD_PTR:.*]] = getelementptr inbounds i8, ptr %[[BASE_RELOADED]], i64 %[[OFFSET_RELOADED]]
+  // CHECK-NEXT:                        %[[ADD_PTR:.*]] = getelementptr inbounds nuw i8, ptr %[[BASE_RELOADED]], i64 %[[OFFSET_RELOADED]]
   // CHECK-SANITIZE-NEXT:               %[[COMPUTED_OFFSET_AGGREGATE:.*]] = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 1, i64 %[[OFFSET_RELOADED]]), !nosanitize
   // CHECK-SANITIZE-NEXT:               %[[COMPUTED_OFFSET_OVERFLOWED:.*]] = extractvalue { i64, i1 } %[[COMPUTED_OFFSET_AGGREGATE]], 1, !nosanitize
   // CHECK-SANITIZE-NEXT:               %[[OR_OV:.+]] = or i1 %[[COMPUTED_OFFSET_OVERFLOWED]], false, !nosanitize
diff --git a/clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c b/clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c
index e93dbcb9f647bf..d884993ffb2b30 100644
--- a/clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c
+++ b/clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c
@@ -50,7 +50,7 @@ char *var_var(char *base, unsigned long offset) {
   // CHECK-NEXT:                        store i64 %[[OFFSET]], ptr %[[OFFSET_ADDR]], align 8
   // CHECK-NEXT:                        %[[BASE_RELOADED:.*]] = load ptr, ptr %[[BASE_ADDR]], align 8
   // CHECK-NEXT:                        %[[OFFSET_RELOADED:.*]] = load i64, ptr %[[OFFSET_ADDR]], align 8
-  // CHECK-NEXT:                        %[[ADD_PTR:.*]] = getelementptr inbounds i8, ptr %[[BASE_RELOADED]], i64 %[[OFFSET_RELOADED]]
+  // CHECK-NEXT:                        %[[ADD_PTR:.*]] = getelementptr inbounds nuw i8, ptr %[[BASE_RELOADED]], i64 %[[OFFSET_RELOADED]]
   // CHECK-SANITIZE-NEXT:               %[[COMPUTED_OFFSET_AGGREGATE:.*]] = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 1, i64 %[[OFFSET_RELOADED]]), !nosanitize
   // CHECK-SANITIZE-NEXT:               %[[COMPUTED_OFFSET_OVERFLOWED:.*]] = extractvalue { i64, i1 } %[[COMPUTED_OFFSET_AGGREGATE]], 1, !nosanitize
   // CHECK-SANITIZE-NEXT:               %[[OR_OV:.+]] = or i1 %[[COMPUTED_OFFSET_OVERFLOWED]], false, !nosanitiz...
[truncated]

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG if CI is happy.

@hazzlim
Copy link
Contributor Author

hazzlim commented Sep 5, 2024

Nice - CI seems happy, so I plan to land this later today if no objections.

@hazzlim hazzlim merged commit 7eca38c into llvm:main Sep 5, 2024
14 checks passed
@zeux
Copy link
Contributor

zeux commented Sep 12, 2024

This change leads to a miscompilation; I've extracted a somewhat minimal repro case from the referenced issue above, in the comment: zeux/pugixml#629 (comment)

The code, when compiled with -O2 -g by latest clang on Linux/X64, generates the following assembly after the call to printf in get_valid_length (which needs to be inlined for the issue to happen):

->  0x555555561270 <+192>: movzx  eax, byte ptr [-0x1]

... which, as you can imagine, results in a sigsegv :) This seems to be architecture independent as it initially reproduced in Wasm and also confirmed on Arm.

I believe that all accesses in this code are in-bounds and no overflow/underflow happens. The code is silent with -fsanitize=integer,undefined when using clang 18.

This can be worked around like so:

		const char* end = data + length;

		for (int i = 1; i <= 4; ++i)
		{
			uint8_t ch = static_cast<uint8_t>(end[-i]);

... however, I do not think this workaround should be necessary because I believe the original code is valid as is.

@zeux
Copy link
Contributor

zeux commented Sep 12, 2024

Here's a slightly smaller reproducer that just reads one element from the array. Similarly, this crashes because the compiler generates a load from an absolute -1 address. The crash is after the first printf call in this case during the computation of the argument to the second printf call. (the code is fairly non-sensical in this revision, but maybe it helps to analyze this!).

#include <stdio.h>
#include <stdint.h>
#include <string.h>

class xml_buffered_writer
{
public:
	xml_buffered_writer(): bufsize(0)
	{
	}

	__attribute__((noinline))
	void write_string(const char* data)
	{
		// write the part of the string that fits in the buffer
		size_t offset = bufsize;

		while (*data && offset < bufcapacity)
			buffer[offset++] = *data++;

		// write the rest
		if (offset < bufcapacity)
		{
			bufsize = offset;
		}
		else
		{
			// backtrack a bit if we have split the codepoint
			size_t length = offset - bufsize;
			const char* data_back = data - length;
			printf("length %d\n", int(length));
			printf("last char %c\n", data_back[length-1]);

			bufsize = offset;
		}
	}

	enum
	{
		bufcapacity = 16
	};

	char buffer[bufcapacity];
	size_t bufsize;
};

int main()
{
	xml_buffered_writer writer;
	writer.write_string("abcdefghijklmnopqrstuvwxyz");
	printf("\n");
}

@efriedma-quic
Copy link
Collaborator

Reproduced: https://godbolt.org/z/G4jrhdjef . Looks like an instcombine bug.

@nikic
Copy link
Contributor

nikic commented Sep 12, 2024

Reduced: https://llvm.godbolt.org/z/TPdd9qore

I believe the problem is this line:

GEP.setIsInBounds(isMergedGEPInBounds(*Src, *cast<GEPOperator>(&GEP)));
This does an in-place modification of the GEP, but only updates the inbounds flag.

@nikic
Copy link
Contributor

nikic commented Sep 13, 2024

Should be fixed by 940f892.

@zeux
Copy link
Contributor

zeux commented Sep 13, 2024

Confirming this fixes the original problem on the larger codebase, thanks!

@efriedma-quic
Copy link
Collaborator

Should be fixed by 940f892.

I quickly glanced through the other uses of setIsInBounds in LLVM; the one in CodeGenPrepare seems suspicious.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 18, 2024

I bisecting this emscripten (wasm32-unknown-emscripten) miscompile to this PR: emscripten-core/emscripten#22794.

It looks like llvm is generating a load with a negative office (which is not supported under wasm). There is a reproducer in that but and I used the following commend to detect the bad load:

$ clang++ --target=wasm32-unknown-emscripten -std=c++20 -O2 -c 3.1.71_temps/GltfImporter.ii && wasm-objdump -d GltfImporter.o | grep i64.load.*4294967280
 01c4a3: 29 02 f0 ff ff ff 0f       |               i64.load 2 4294967280

Here you can see an i64.load instruction with an offset of 4294967280 (which is -16 interpreted as an unsigned int).

@nikic
Copy link
Contributor

nikic commented Nov 18, 2024

@sbc100 That wasm is not necessarily a miscompile, it may be UB in the source program, if it uses a negative number in an unsigned variable to index an array. Is the affected code clean under ubsan? (See #108770 for a similar report that was due to UB.)

@sbc100
Copy link
Collaborator

sbc100 commented Nov 18, 2024

Perhaps yes. Do you know of any way to reduce the input program into something we can try to inspect. The preprocessed C++ file is over 100,000 lines :-/

@sbc100
Copy link
Collaborator

sbc100 commented Nov 18, 2024

I used wasm-reduce on the .ll file and it looks like you were correct. The result was:

target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-f128:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-emscripten"                                         
                                                                                    
define void @_ZN6Magnum5Trade12GltfImporter6doMeshEjj(ptr %agg.tmp4.i.i.i.i.i) personality ptr null {
entry:                                                                           
  %0 = load i64, ptr inttoptr (i32 -16 to ptr), align 4                          
  store i64 %0, ptr %agg.tmp4.i.i.i.i.i, align 8                                 
  ret void                                                                       
}                                                                                
                                                                                 
; uselistorder directives                                                        
uselistorder ptr null, { 1, 2, 0 }  

It looks like there is a -16 to ptr cast in there. I've ask the reported to try running with -fsanitize=undefined and -fsanitize=address. Do you think those will catch this kind of thing?

@efriedma-quic
Copy link
Collaborator

ubsan pointer overflow check should catch issues in the source code, I think. If that doesn't work, maybe try opt-bisect-limit to see if it turns up anything useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:PowerPC clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category HLSL HLSL Language Support
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants