Skip to content

[GVN] Drop Clobber dependency if store may overwrite only the same value #68322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 10, 2023

Conversation

skachkov-sc
Copy link
Contributor

Motivation: https://godbolt.org/z/sfx6fPc3n

In this example the second load is redundant and can be optimized. The problem is a cloberring store between it and first load, but here it can be safely skipped because one-byte store with alignment of 1 can only be MustAlias or NoAlias (no partial overlapping possible), and in case of MustAlias it will rewrite exactly the same value.

Note: this case can be optimized by GCC and there were previous attempts to fix this issue, e.g. https://reviews.llvm.org/D155843

@skachkov-sc skachkov-sc requested a review from nikic October 5, 2023 15:27
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Oct 5, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 5, 2023

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Changes

Motivation: https://godbolt.org/z/sfx6fPc3n

In this example the second load is redundant and can be optimized. The problem is a cloberring store between it and first load, but here it can be safely skipped because one-byte store with alignment of 1 can only be MustAlias or NoAlias (no partial overlapping possible), and in case of MustAlias it will rewrite exactly the same value.

Note: this case can be optimized by GCC and there were previous attempts to fix this issue, e.g. https://reviews.llvm.org/D155843


Full diff: https://github.com/llvm/llvm-project/pull/68322.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/MemoryDependenceAnalysis.cpp (+39-5)
  • (added) llvm/test/Transforms/GVN/rle-clobbering-store.ll (+91)
diff --git a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
index 071ecdba8a54ace..eb503e4a781c94e 100644
--- a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
@@ -360,11 +360,42 @@ MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
   return MemDepResult::getNonLocal();
 }
 
+// canSkipClobberingStore - check if SI that may alias with MemLoc can be safely
+// skipped. This is possible in case if SI can only must alias or no alias with
+// MemLoc (no partial overlapping possible) and it writes the same value that
+// MemLoc contains now (it was loaded before this store and was not modified in
+// between).
+static bool canSkipClobberingStore(const StoreInst *SI,
+                                   const MemoryLocation &MemLoc,
+                                   Align MemLocAlign, BatchAAResults &BatchAA) {
+  if (!MemLoc.Size.hasValue())
+    return false;
+  if (MemoryLocation::get(SI).Size != MemLoc.Size)
+    return false;
+  if (MemLocAlign.value() < MemLoc.Size.getValue())
+    return false;
+  if (SI->getAlign() < MemLocAlign)
+    return false;
+
+  auto *LI = dyn_cast<LoadInst>(SI->getValueOperand());
+  if (!LI || LI->getParent() != SI->getParent())
+    return false;
+  if (BatchAA.alias(MemoryLocation::get(LI), MemLoc) != AliasResult::MustAlias)
+    return false;
+  for (const auto *I = LI->getNextNode(); I != SI; I = I->getNextNode())
+    if (isModSet(BatchAA.getModRefInfo(I, MemLoc)))
+      return false;
+
+  return true;
+}
+
 MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
     const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
     BasicBlock *BB, Instruction *QueryInst, unsigned *Limit,
     BatchAAResults &BatchAA) {
   bool isInvariantLoad = false;
+  Align MemLocAlign =
+      MemLoc.Ptr->getPointerAlignment(BB->getModule()->getDataLayout());
 
   unsigned DefaultLimit = getDefaultBlockScanLimit();
   if (!Limit)
@@ -402,11 +433,12 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
   // do want to respect mustalias results since defs are useful for value
   // forwarding, but any mayalias write can be assumed to be noalias.
   // Arguably, this logic should be pushed inside AliasAnalysis itself.
-  if (isLoad && QueryInst) {
-    LoadInst *LI = dyn_cast<LoadInst>(QueryInst);
-    if (LI && LI->hasMetadata(LLVMContext::MD_invariant_load))
-      isInvariantLoad = true;
-  }
+  if (isLoad && QueryInst)
+    if (LoadInst *LI = dyn_cast<LoadInst>(QueryInst)) {
+      if (LI->hasMetadata(LLVMContext::MD_invariant_load))
+        isInvariantLoad = true;
+      MemLocAlign = LI->getAlign();
+    }
 
   // True for volatile instruction.
   // For Load/Store return true if atomic ordering is stronger than AO,
@@ -577,6 +609,8 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
         return MemDepResult::getDef(Inst);
       if (isInvariantLoad)
         continue;
+      if (canSkipClobberingStore(SI, MemLoc, MemLocAlign, BatchAA))
+        continue;
       return MemDepResult::getClobber(Inst);
     }
 
diff --git a/llvm/test/Transforms/GVN/rle-clobbering-store.ll b/llvm/test/Transforms/GVN/rle-clobbering-store.ll
new file mode 100644
index 000000000000000..d6b8949752af4ac
--- /dev/null
+++ b/llvm/test/Transforms/GVN/rle-clobbering-store.ll
@@ -0,0 +1,91 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt < %s -passes=gvn -S | FileCheck %s
+
+define void @test_i8(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_i8(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT:    store i8 [[TMP0]], ptr [[B]], align 1
+; CHECK-NEXT:    store i8 [[TMP0]], ptr [[C]], align 1
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i8, ptr %a, align 1
+  store i8 %0, ptr %b, align 1
+  %1 = load i8, ptr %a, align 1
+  store i8 %1, ptr %c, align 1
+  ret void
+}
+
+define void @test_i32(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_i32(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[B]], align 4
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[C]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i32, ptr %a, align 4
+  store i32 %0, ptr %b, align 4
+  %1 = load i32, ptr %a, align 4
+  store i32 %1, ptr %c, align 4
+  ret void
+}
+
+define void @test_float(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_float(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[A]], align 4
+; CHECK-NEXT:    store float [[TMP0]], ptr [[B]], align 4
+; CHECK-NEXT:    store float [[TMP0]], ptr [[C]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load float, ptr %a, align 4
+  store float %0, ptr %b, align 4
+  %1 = load float, ptr %a, align 4
+  store float %1, ptr %c, align 4
+  ret void
+}
+
+define void @test_unaligned(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_unaligned(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:    store i32 [[TMP0]], ptr [[B]], align 2
+; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:    store i32 [[TMP1]], ptr [[C]], align 2
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i32, ptr %a, align 4
+  store i32 %0, ptr %b, align 2
+  %1 = load i32, ptr %a, align 4
+  store i32 %1, ptr %c, align 2
+  ret void
+}
+
+define void @test_modify_between(ptr %a, ptr %b, ptr %c) {
+; CHECK-LABEL: define void @test_modify_between(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], ptr [[C:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT:    store i8 42, ptr [[C]], align 1
+; CHECK-NEXT:    store i8 [[TMP0]], ptr [[B]], align 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[A]], align 1
+; CHECK-NEXT:    store i8 [[TMP1]], ptr [[C]], align 1
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load i8, ptr %a, align 1
+  store i8 42, ptr %c, align 1
+  store i8 %0, ptr %b, align 1
+  %1 = load i8, ptr %a, align 1
+  store i8 %1, ptr %c, align 1
+  ret void
+}

@skachkov-sc
Copy link
Contributor Author

Results on test-suite with SPEC2006 (number of removed redundant loads):

Metric: gvn.NumGVNLoad

Program                                                                                   gvn.NumGVNLoad                
                                                                                          before         after   diff   
                                  test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test    0.00           1.00    inf%
        test-suite :: SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect.test   72.00        1352.00 1777.8%
test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test    2.00           4.00  100.0%
                            test-suite :: MultiSource/Benchmarks/Olden/health/health.test    1.00           2.00  100.0%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 1388.00        2371.00   70.8%
                               test-suite :: MultiSource/Benchmarks/McCat/09-vor/vor.test   12.00          20.00   66.7%
                           test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test    3.00           4.00   33.3%
                                test-suite :: External/SPEC/CINT2006/429.mcf/429.mcf.test    6.00           8.00   33.3%
            test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   85.00          96.00   12.9%
                  test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   88.00          99.00   12.5%
                  test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  106.00         118.00   11.3%
                               test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   62.00          69.00   11.3%
                                      test-suite :: MultiSource/Applications/hbd/hbd.test   36.00          40.00   11.1%
                test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test   12.00          13.00    8.3%
                                  test-suite :: MultiSource/Applications/SPASS/SPASS.test  365.00         395.00    8.2%
                                      test-suite :: MultiSource/Applications/lua/lua.test   98.00         106.00    8.2%
                test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   49.00          53.00    8.2%
                           test-suite :: External/SPEC/CFP2006/450.soplex/450.soplex.test  282.00         305.00    8.2%
                            test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test   38.00          41.00    7.9%
                              test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  307.00         319.00    3.9%
                            test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test  208.00         215.00    3.4%
                        test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 2400.00        2480.00    3.3%
                             test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  655.00         676.00    3.2%
                             test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2492.00        2559.00    2.7%
                   test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test   40.00          41.00    2.5%
                          test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  726.00         743.00    2.3%
                  test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test  829.00         847.00    2.2%
                                        test-suite :: MultiSource/Benchmarks/sim/sim.test   49.00          50.00    2.0%
                    test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  106.00         108.00    1.9%
                         test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test  108.00         110.00    1.9%
                           test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test  660.00         672.00    1.8%
                            test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test   58.00          59.00    1.7%
                               test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test  183.00         186.00    1.6%
                           test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1050.00        1067.00    1.6%
                                  test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test   63.00          64.00    1.6%
                                test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 2412.00        2447.00    1.5%
          test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test   70.00          71.00    1.4%
                              test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  152.00         154.00    1.3%
                    test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  789.00         799.00    1.3%
                                 test-suite :: MultiSource/Applications/kimwitu++/kc.test  172.00         174.00    1.2%
                                test-suite :: MultiSource/Applications/oggenc/oggenc.test  445.00         449.00    0.9%
              test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test  124.00         125.00    0.8%
                            test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test  513.00         517.00    0.8%
                                  test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 1692.00        1704.00    0.7%
                        test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test  466.00         469.00    0.6%
                    test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 1943.00        1953.00    0.5%
            test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  416.00         418.00    0.5%
                 test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test  275.00         276.00    0.4%
                       test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test  277.00         278.00    0.4%
                       test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test  295.00         296.00    0.3%
                 test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test  298.00         299.00    0.3%
                 test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test  299.00         300.00    0.3%
                       test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test  312.00         313.00    0.3%
                              test-suite :: MultiSource/Applications/ClamAV/clamscan.test  737.00         738.00    0.1%
                            test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1466.00        1467.00    0.1%

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -360,11 +360,42 @@ MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
return MemDepResult::getNonLocal();
}

// canSkipClobberingStore - check if SI that may alias with MemLoc can be safely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't repeat function name in comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

%1 = load float, ptr %a, align 4
store float %1, ptr %c, align 4
ret void
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test non-byte-sized value (e.g. i1). I'm not entirely sure the optimization is correct in that case. (Because the store will write bits outside the value itself.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_i1, I think that we shouldn't have any problems with it, because the only relevant information for this optimization is the location size (not loaded/stored type), and for i1 the location size will be DataLayout::getTypeStoreSize(i1), which is something like 1 byte in most cases, so this test should be optimized very similarly to test_i8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problematic case I had in mind here is this one:

define i8 @test_i1(ptr %a, ptr %b, ptr %c) {
entry:
  %0 = load i1, ptr %a, align 1
  store i1 %0, ptr %b, align 1
  %1 = load i8, ptr %a, align 1
  ret i8 %1
}

store i1 %0 is going to write the same value to the low bit, but it will also write the 7 high bits (typically to zero, though this is unspecified), so it's not actually a no-op.

Though given that LangRef says

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

this would fall under undefined behavior, so I guess it's fine.

@skachkov-sc skachkov-sc force-pushed the gvn-cloberring-store branch from 07a2e89 to 6df3c7b Compare October 6, 2023 12:10
@@ -577,6 +610,9 @@ MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
return MemDepResult::getDef(Inst);
if (isInvariantLoad)
continue;
if (canSkipClobberingStore(SI, MemLoc, MemLocAlign, BatchAA,
getDefaultBlockScanLimit()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do you think it would make sense to pass Limit here instead? That way, if we already walked ~100 instructions in the main clobber walk, we're not going to walk another ~100 trying to skip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've checked statistics on llvm-test-suite (gvn.NumGVNLoad, gvn.NumPRELoad) and seems that it will not cause any changes with this more conservative limit

%1 = load float, ptr %a, align 4
store float %1, ptr %c, align 4
ret void
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problematic case I had in mind here is this one:

define i8 @test_i1(ptr %a, ptr %b, ptr %c) {
entry:
  %0 = load i1, ptr %a, align 1
  store i1 %0, ptr %b, align 1
  %1 = load i8, ptr %a, align 1
  ret i8 %1
}

store i1 %0 is going to write the same value to the low bit, but it will also write the 7 high bits (typically to zero, though this is unspecified), so it's not actually a no-op.

Though given that LangRef says

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

this would fall under undefined behavior, so I guess it's fine.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

In some cases clobbering store can be safely skipped if it can only must
or no alias with memory location and it writes the same value. This
patch supports simple case when the value from memory location was
loaded in the same basic block before the store and there are no
modifications between them.
@skachkov-sc skachkov-sc merged commit 9f12f65 into llvm:main Oct 10, 2023
@skachkov-sc skachkov-sc deleted the gvn-cloberring-store branch October 10, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants