[GVN] Drop Clobber dependency if store may overwrite only the same value #68322

skachkov-sc · 2023-10-05T15:27:02Z

Motivation: https://godbolt.org/z/sfx6fPc3n

In this example the second load is redundant and can be optimized. The problem is a cloberring store between it and first load, but here it can be safely skipped because one-byte store with alignment of 1 can only be MustAlias or NoAlias (no partial overlapping possible), and in case of MustAlias it will rewrite exactly the same value.

Note: this case can be optimized by GCC and there were previous attempts to fix this issue, e.g. https://reviews.llvm.org/D155843

llvmbot · 2023-10-05T15:28:15Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Changes

Motivation: https://godbolt.org/z/sfx6fPc3n

In this example the second load is redundant and can be optimized. The problem is a cloberring store between it and first load, but here it can be safely skipped because one-byte store with alignment of 1 can only be MustAlias or NoAlias (no partial overlapping possible), and in case of MustAlias it will rewrite exactly the same value.

Note: this case can be optimized by GCC and there were previous attempts to fix this issue, e.g. https://reviews.llvm.org/D155843

Full diff: https://github.com/llvm/llvm-project/pull/68322.diff

2 Files Affected:

(modified) llvm/lib/Analysis/MemoryDependenceAnalysis.cpp (+39-5)
(added) llvm/test/Transforms/GVN/rle-clobbering-store.ll (+91)

diff --git a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
index 071ecdba8a54ace..eb503e4a781c94e 100644
--- a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
@@ -360,11 +360,42 @@ MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
   return MemDepResult::getNonLocal();
 }
 
+// canSkipClobberingStore - check if SI that may alias with MemLoc can be safely
+// skipped. This is possible in case if SI can only must alias or no alias with
+// MemLoc (no partial overlapping possible) and it writes the same value that
+// MemLoc contains now (it was loaded before this store and was not modified in
+// between).
+static bool canSkipClobberingStore(const StoreInst *SI,
+                                   const MemoryLocation &MemLoc,
+                                   Align MemLocAlign, BatchAAResults &BatchAA) {
+  if (!MemLoc.Size.hasValue())
+    return false;
+  if (MemoryLocation::get(SI).Size != MemLoc.Size)
+    return false;
+  if (MemLocAlign.value() < MemLoc.Size.getValue())
+    return false;
+  if (SI->getAlign() < MemLocAlign)
+    return false;
+
+  auto *LI = dyn_cast<LoadInst>(SI->getValueOperand());
+  if (!LI || LI->getParent() != SI->getParent())
+    return false;
+  if (BatchAA.alias(MemoryLocation::get(LI), MemLoc) != AliasResult::MustAlias)
+    return false;
+  for (const auto *I = LI->getNextNode(); I != SI; I = I->getNextNode())
+    if (isModSet(BatchAA.getModRefInfo(I, MemLoc)))
+      return false;
+
+  return true;
+}
+
 MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
     const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
     BasicBlock *BB, Instruction *QueryInst, unsigned *Limit,
     BatchAAResults &BatchAA) {
   bool isInvariantLoad = false;
+  Align MemLocAlign =
+      MemLoc.Ptr->getPointerAlignment(BB->getModule()->getDataLayout());
 
   unsigned DefaultLimit = getDefaultBlockScanLimit();
   if (!Limit)
@@ -402,11 +433,12 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
   // do want to respect mustalias results since defs are useful for value
   // forwarding, but any mayalias write can be assumed to be noalias.
   // Arguably, this logic should be pushed inside AliasAnalysis itself.
-  if (isLoad && QueryInst) {
-    LoadInst *LI = dyn_cast<LoadInst>(QueryInst);
-    if (LI && LI->hasMetadata(LLVMContext::MD_invariant_load))
-      isInvariantLoad = true;
-  }
+  if (isLoad && QueryInst)
+    if (LoadInst *LI = dyn_cast<LoadInst>(QueryInst)) {
+      if (LI->hasMetadata(LLVMContext::MD_invariant_load))
+        isInvariantLoad = true;
+      MemLocAlign = LI->getAlign();
+    }
 
   // True for volatile instruction.
   // For Load/Store return true if atomic ordering is stronger than AO,
@@ -577,6 +609,8 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
         return MemDepResult::getDef(Inst);
       if (isInvariantLoad)
         continue;
+      if (canSkipClobberingStore(SI, MemLoc, MemLocAlign, BatchAA))
+        continue;
       return MemDepResult::getClobber(Inst);
     }
 
diff --git a/llvm/test/Transforms/GVN/rle-clobbering-store.ll b/llvm/test/Transforms/GVN/rle-clobbering-store.ll
new file mode 100644
index 000000000000000..d6b8949752af4ac
--- /dev/null
+++ b/llvm/test/Transforms/GVN/rle-clobbering-store.ll
@@ -0,0 +1,91 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt < %s -passes=gvn -S | FileCheck %s
+
+define void @test_i8(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_i8(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT:    store i8 [[TMP0]], ptr [[B]], align 1
+; CHECK-NEXT:    store i8 [[TMP0]], ptr [[C]], align 1
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i8, ptr %a, align 1
+  store i8 %0, ptr %b, align 1
+  %1 = load i8, ptr %a, align 1
+  store i8 %1, ptr %c, align 1
+  ret void
+}
+
+define void @test_i32(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_i32(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[B]], align 4
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[C]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i32, ptr %a, align 4
+  store i32 %0, ptr %b, align 4
+  %1 = load i32, ptr %a, align 4
+  store i32 %1, ptr %c, align 4
+  ret void
+}
+
+define void @test_float(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_float(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[A]], align 4
+; CHECK-NEXT:    store float [[TMP0]], ptr [[B]], align 4
+; CHECK-NEXT:    store float [[TMP0]], ptr [[C]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load float, ptr %a, align 4
+  store float %0, ptr %b, align 4
+  %1 = load float, ptr %a, align 4
+  store float %1, ptr %c, align 4
+  ret void
+}
+
+define void @test_unaligned(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_unaligned(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[B]], align 2
+; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:    store i32 [[TMP1]], ptr [[C]], align 2
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i32, ptr %a, align 4
+  store i32 %0, ptr %b, align 2
+  %1 = load i32, ptr %a, align 4
+  store i32 %1, ptr %c, align 2
+  ret void
+}
+
+define void @test_modify_between(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_modify_between(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT:    store i8 42, ptr [[C]], align 1
+; CHECK-NEXT:    store i8 [[TMP0]], ptr [[B]], align 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT:    store i8 [[TMP1]], ptr [[C]], align 1
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i8, ptr %a, align 1
+  store i8 42, ptr %c, align 1
+  store i8 %0, ptr %b, align 1
+  %1 = load i8, ptr %a, align 1
+  store i8 %1, ptr %c, align 1
+  ret void
+}

skachkov-sc · 2023-10-05T15:32:02Z

Results on test-suite with SPEC2006 (number of removed redundant loads):

Metric: gvn.NumGVNLoad

Program                                                                                   gvn.NumGVNLoad                
                                                                                          before         after   diff   
                                  test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test    0.00           1.00    inf%
        test-suite :: SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.test   72.00        1352.00 1777.8%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test    2.00           4.00  100.0%
                            test-suite :: MultiSource/Benchmarks/Olden/health/health.test    1.00           2.00  100.0%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 1388.00        2371.00   70.8%
                               test-suite :: MultiSource/Benchmarks/McCat/09-vor/vor.test   12.00          20.00   66.7%
                           test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test    3.00           4.00   33.3%
                                test-suite :: External/SPEC/CINT2006/429.mcf/429.mcf.test    6.00           8.00   33.3%
            test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   85.00          96.00   12.9%
                  test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   88.00          99.00   12.5%
                  test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  106.00         118.00   11.3%
                               test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   62.00          69.00   11.3%
                                      test-suite :: MultiSource/Applications/hbd/hbd.test   36.00          40.00   11.1%
                test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test   12.00          13.00    8.3%
                                  test-suite :: MultiSource/Applications/SPASS/SPASS.test  365.00         395.00    8.2%
                                      test-suite :: MultiSource/Applications/lua/lua.test   98.00         106.00    8.2%
                test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   49.00          53.00    8.2%
                           test-suite :: External/SPEC/CFP2006/450.soplex/450.soplex.test  282.00         305.00    8.2%
                            test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test   38.00          41.00    7.9%
                              test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  307.00         319.00    3.9%
                            test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test  208.00         215.00    3.4%
                        test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 2400.00        2480.00    3.3%
                             test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  655.00         676.00    3.2%
                             test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2492.00        2559.00    2.7%
                   test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test   40.00          41.00    2.5%
                          test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  726.00         743.00    2.3%
                  test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test  829.00         847.00    2.2%
                                        test-suite :: MultiSource/Benchmarks/sim/sim.test   49.00          50.00    2.0%
                    test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  106.00         108.00    1.9%
                         test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test  108.00         110.00    1.9%
                           test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test  660.00         672.00    1.8%
                            test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test   58.00          59.00    1.7%
                               test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test  183.00         186.00    1.6%
                           test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1050.00        1067.00    1.6%
                                  test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test   63.00          64.00    1.6%
                                test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 2412.00        2447.00    1.5%
          test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test   70.00          71.00    1.4%
                              test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  152.00         154.00    1.3%
                    test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  789.00         799.00    1.3%
                                 test-suite :: MultiSource/Applications/kimwitu++/kc.test  172.00         174.00    1.2%
                                test-suite :: MultiSource/Applications/oggenc/oggenc.test  445.00         449.00    0.9%
              test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test  124.00         125.00    0.8%
                            test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test  513.00         517.00    0.8%
                                  test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 1692.00        1704.00    0.7%
                        test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test  466.00         469.00    0.6%
                    test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 1943.00        1953.00    0.5%
            test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  416.00         418.00    0.5%
                 test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test  275.00         276.00    0.4%
                       test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test  277.00         278.00    0.4%
                       test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test  295.00         296.00    0.3%
                 test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test  298.00         299.00    0.3%
                 test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test  299.00         300.00    0.3%
                       test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test  312.00         313.00    0.3%
                              test-suite :: MultiSource/Applications/ClamAV/clamscan.test  737.00         738.00    0.1%
                            test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1466.00        1467.00    0.1%

nikic

Looks reasonable to me.

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=ff5e2babcb46e7eb3887ee265decb2948da2792c&to=8c6f93c8606faeb2a750cf7c31e0effd21bb88e6&stat=instructions:u
Code-size: http://llvm-compile-time-tracker.com/compare.php?from=ff5e2babcb46e7eb3887ee265decb2948da2792c&to=8c6f93c8606faeb2a750cf7c31e0effd21bb88e6&stat=size-text

nikic · 2023-10-06T09:06:43Z

llvm/lib/Analysis/MemoryDependenceAnalysis.cpp

@@ -360,11 +360,42 @@ MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
  return MemDepResult::getNonLocal();
 }

+// canSkipClobberingStore - check if SI that may alias with MemLoc can be safely


Don't repeat function name in comment.

llvm/lib/Analysis/MemoryDependenceAnalysis.cpp

llvm/test/Transforms/GVN/rle-clobbering-store.ll

nikic · 2023-10-06T09:29:31Z

llvm/test/Transforms/GVN/rle-clobbering-store.ll

+  %1 = load float, ptr %a, align 4
+  store float %1, ptr %c, align 4
+  ret void
+}


Please also test non-byte-sized value (e.g. i1). I'm not entirely sure the optimization is correct in that case. (Because the store will write bits outside the value itself.)

Added test_i1, I think that we shouldn't have any problems with it, because the only relevant information for this optimization is the location size (not loaded/stored type), and for i1 the location size will be DataLayout::getTypeStoreSize(i1), which is something like 1 byte in most cases, so this test should be optimized very similarly to test_i8.

The problematic case I had in mind here is this one:

define i8 @test_i1(ptr %a, ptr %b, ptr %c) { entry: %0 = load i1, ptr %a, align 1 store i1 %0, ptr %b, align 1 %1 = load i8, ptr %a, align 1 ret i8 %1 }

store i1 %0 is going to write the same value to the low bit, but it will also write the 7 high bits (typically to zero, though this is unspecified), so it's not actually a no-op.

Though given that LangRef says

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

this would fall under undefined behavior, so I guess it's fine.

llvm/lib/Analysis/MemoryDependenceAnalysis.cpp

nikic · 2023-10-09T12:50:44Z

llvm/lib/Analysis/MemoryDependenceAnalysis.cpp

@@ -577,6 +610,9 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
        return MemDepResult::getDef(Inst);
      if (isInvariantLoad)
        continue;
+      if (canSkipClobberingStore(SI, MemLoc, MemLocAlign, BatchAA,
+                                 getDefaultBlockScanLimit()))


Hm, do you think it would make sense to pass Limit here instead? That way, if we already walked ~100 instructions in the main clobber walk, we're not going to walk another ~100 trying to skip.

Agreed, I've checked statistics on llvm-test-suite (gvn.NumGVNLoad, gvn.NumPRELoad) and seems that it will not cause any changes with this more conservative limit

nikic · 2023-10-09T12:56:51Z

llvm/test/Transforms/GVN/rle-clobbering-store.ll

+  %1 = load float, ptr %a, align 4
+  store float %1, ptr %c, align 4
+  ret void
+}


The problematic case I had in mind here is this one:

define i8 @test_i1(ptr %a, ptr %b, ptr %c) { entry: %0 = load i1, ptr %a, align 1 store i1 %0, ptr %b, align 1 %1 = load i8, ptr %a, align 1 ret i8 %1 }

store i1 %0 is going to write the same value to the low bit, but it will also write the 7 high bits (typically to zero, though this is unspecified), so it's not actually a no-op.

Though given that LangRef says

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

this would fall under undefined behavior, so I guess it's fine.

nikic

LGTM

In some cases clobbering store can be safely skipped if it can only must or no alias with memory location and it writes the same value. This patch supports simple case when the value from memory location was loaded in the same basic block before the store and there are no modifications between them.

skachkov-sc requested a review from nikic October 5, 2023 15:27

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Oct 5, 2023

nikic reviewed Oct 6, 2023

View reviewed changes

skachkov-sc force-pushed the gvn-cloberring-store branch from 07a2e89 to 6df3c7b Compare October 6, 2023 12:10

nikic reviewed Oct 9, 2023

View reviewed changes

nikic approved these changes Oct 10, 2023

View reviewed changes

skachkov-sc force-pushed the gvn-cloberring-store branch from 40161cd to eca1041 Compare October 10, 2023 16:10

skachkov-sc merged commit 9f12f65 into llvm:main Oct 10, 2023

skachkov-sc deleted the gvn-cloberring-store branch October 10, 2023 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GVN] Drop Clobber dependency if store may overwrite only the same value #68322

[GVN] Drop Clobber dependency if store may overwrite only the same value #68322

Uh oh!

skachkov-sc commented Oct 5, 2023

Uh oh!

llvmbot commented Oct 5, 2023 •

edited

Loading

Uh oh!

skachkov-sc commented Oct 5, 2023

Uh oh!

nikic left a comment

Uh oh!

nikic Oct 6, 2023

Uh oh!

skachkov-sc Oct 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic Oct 6, 2023

Uh oh!

skachkov-sc Oct 6, 2023

Uh oh!

nikic Oct 9, 2023

Uh oh!

Uh oh!

nikic Oct 9, 2023

Uh oh!

skachkov-sc Oct 10, 2023

Uh oh!

nikic Oct 9, 2023

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

[GVN] Drop Clobber dependency if store may overwrite only the same value #68322

[GVN] Drop Clobber dependency if store may overwrite only the same value #68322

Uh oh!

Conversation

skachkov-sc commented Oct 5, 2023

Uh oh!

llvmbot commented Oct 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skachkov-sc commented Oct 5, 2023

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Oct 5, 2023 •

edited

Loading