[LoopInterchange] Add an option to prioritize vectorization #131988

kasuga-fj · 2025-03-19T09:39:49Z

The LoopInterchange cost-model consists of several decision rules. They are called one by one, and if some rule can determine the profitability, then the subsequent rules aren't called. In the current implementation, the rule for CacheCostAnalysis is called first, and if it fails to determine the profitability, then the rule for vectorization is called. However, there are cases where interchanging loops for vectorization makes the code faster even if such exchanges are detrimental to the cache. For example, exchanging the inner two loops in the following example looks about x3 faster in my local (compiled with -O3 -mcpu=neoverse-v2 -mllvm -cache-line-size=64), even though it's rejected by the rule based on cache cost. (NOTE: LoopInterchange cannot exchange these loops due to legality checks. This should also be improved.)

__attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256],
                                   dd[256][256],ee[256][256],ff[256][256];

// Alternative of TSVC s231 with more array accesses than the original.
void s231_alternative() {
  for (int nl = 0; nl < 100*(100000/256); nl++) {
    for (int i = 0; i < 256; ++i) {
      for (int j = 1; j < 256; j++) {
        aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j]
                 + dd[i][j] + ee[i][j] + ff[i][j];
      }
    }
  }
}

This patch introduces a new option to prioritize the vectorization rule over the cache cost rule.

Related issue: #131130

The LoopInterchange cost-model consists of several decision rules. They are called one by one, and if some rule can determine the profitability, then the subsequent rules aren't called. In the current implementation, the rule for `CacheCostAnalysis` is called first, and if it fails to determine the profitability, then the rule for vectorization is called. However, there are cases where interchanging loops for vectorization makes the code faster even if such exchanges are detrimental to the cache. For example, exchanging the inner two loops in the following example looks about x3 faster in my local (compiled with `-O3 -mcpu=neoverse-v2 -mllvm -cache-line-size=64`), even though it's rejected by the rule based on cache cost. (NOTE: LoopInterchange cannot exchange these loops due to legality checks. This should also be improved.) ```c __attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256], dd[256][256],ee[256][256],ff[256][256]; // Alternative of TSVC s231 with more array accesses than the original. void s231_alternative() { for (int nl = 0; nl < 100*(100000/256); nl++) { for (int i = 0; i < 256; ++i) { for (int j = 1; j < 256; j++) { aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j] + dd[i][j] + ee[i][j] + ff[i][j]; } } } } ``` This patch introduces a new option to prioritize the vectorization rule over the cache cost rule. Related issue: llvm#131130

llvmbot · 2025-03-19T09:40:26Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

The LoopInterchange cost-model consists of several decision rules. They are called one by one, and if some rule can determine the profitability, then the subsequent rules aren't called. In the current implementation, the rule for CacheCostAnalysis is called first, and if it fails to determine the profitability, then the rule for vectorization is called. However, there are cases where interchanging loops for vectorization makes the code faster even if such exchanges are detrimental to the cache. For example, exchanging the inner two loops in the following example looks about x3 faster in my local (compiled with -O3 -mcpu=neoverse-v2 -mllvm -cache-line-size=64), even though it's rejected by the rule based on cache cost. (NOTE: LoopInterchange cannot exchange these loops due to legality checks. This should also be improved.)

__attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256],
                                   dd[256][256],ee[256][256],ff[256][256];

// Alternative of TSVC s231 with more array accesses than the original.
void s231_alternative() {
  for (int nl = 0; nl &lt; 100*(100000/256); nl++) {
    for (int i = 0; i &lt; 256; ++i) {
      for (int j = 1; j &lt; 256; j++) {
        aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j]
                 + dd[i][j] + ee[i][j] + ff[i][j];
      }
    }
  }
}

This patch introduces a new option to prioritize the vectorization rule over the cache cost rule.

Related issue: #131130

Full diff: https://github.com/llvm/llvm-project/pull/131988.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+49-13)
(added) llvm/test/Transforms/LoopInterchange/profitability-vectorization.ll (+81)

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 61b86d2aecef0..364acb0ebc344 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -84,6 +84,11 @@ static cl::opt<unsigned int> MaxLoopNestDepth(
     "loop-interchange-max-loop-nest-depth", cl::init(10), cl::Hidden,
     cl::desc("Maximum depth of loop nest considered for the transform"));
 
+static cl::opt<bool> PrioritizeVectorization(
+    "loop-interchange-prioritize-vectorization", cl::init(false), cl::Hidden,
+    cl::desc("Prioritize increasing vectorization opportunity over cache cost "
+             "when determining profitability"));
+
 #ifndef NDEBUG
 static void printDepMatrix(CharMatrix &DepMatrix) {
   for (auto &Row : DepMatrix) {
@@ -1193,22 +1198,53 @@ bool LoopInterchangeProfitability::isProfitable(
     unsigned OuterLoopId, CharMatrix &DepMatrix,
     const DenseMap<const Loop *, unsigned> &CostMap,
     std::unique_ptr<CacheCost> &CC) {
-  // isProfitable() is structured to avoid endless loop interchange.
-  // If loop cache analysis could decide the profitability then,
-  // profitability check will stop and return the analysis result.
-  // If cache analysis failed to analyze the loopnest (e.g.,
-  // due to delinearization issues) then only check whether it is
-  // profitable for InstrOrderCost. Likewise, if InstrOrderCost failed to
-  // analysis the profitability then only, isProfitableForVectorization
-  // will decide.
-  std::optional<bool> shouldInterchange =
-      isProfitablePerLoopCacheAnalysis(CostMap, CC);
-  if (!shouldInterchange.has_value()) {
-    shouldInterchange = isProfitablePerInstrOrderCost();
-    if (!shouldInterchange.has_value())
+  // isProfitable() is structured to avoid endless loop interchange. If the
+  // highest priority rule (isProfitablePerLoopCacheAnalysis by default) could
+  // decide the profitability then, profitability check will stop and return the
+  // analysis result. If it failed to determine it (e.g., cache analysis failed
+  // to analyze the loopnest due to delinearization issues) then go ahead the
+  // second highest priority rule (isProfitablePerInstrOrderCost by default).
+  // Likewise, if it failed to analysis the profitability then only, the last
+  // rule (isProfitableForVectorization by default) will decide.
+  enum class RuleTy {
+    PerLoopCacheAnalysis,
+    PerInstrOrderCost,
+    ForVectorization,
+  };
+
+  // We prefer cache cost to vectorization by default.
+  RuleTy RuleOrder[3] = {RuleTy::PerLoopCacheAnalysis,
+                         RuleTy::PerInstrOrderCost, RuleTy::ForVectorization};
+
+  // If we prefer vectorization to cache cost, change the order of application
+  // of each rule.
+  if (PrioritizeVectorization) {
+    RuleOrder[0] = RuleTy::ForVectorization;
+    RuleOrder[1] = RuleTy::PerLoopCacheAnalysis;
+    RuleOrder[2] = RuleTy::PerInstrOrderCost;
+  }
+
+  std::optional<bool> shouldInterchange;
+  for (RuleTy RT : RuleOrder) {
+    switch (RT) {
+    case RuleTy::PerLoopCacheAnalysis:
+      shouldInterchange = isProfitablePerLoopCacheAnalysis(CostMap, CC);
+      break;
+    case RuleTy::PerInstrOrderCost:
+      shouldInterchange = isProfitablePerInstrOrderCost();
+      break;
+    case RuleTy::ForVectorization:
       shouldInterchange =
           isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
+      break;
+    }
+
+    // If this rule could determine the profitability, don't call subsequent
+    // rules.
+    if (shouldInterchange.has_value())
+      break;
   }
+
   if (!shouldInterchange.has_value()) {
     ORE->emit([&]() {
       return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",
diff --git a/llvm/test/Transforms/LoopInterchange/profitability-vectorization.ll b/llvm/test/Transforms/LoopInterchange/profitability-vectorization.ll
new file mode 100644
index 0000000000000..0018aa0308f28
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/profitability-vectorization.ll
@@ -0,0 +1,81 @@
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 \
+; RUN:     -pass-remarks-output=%t -disable-output
+; RUN: FileCheck -input-file %t --check-prefix=PROFIT-CACHE %s
+
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 \
+; RUN:     -pass-remarks-output=%t -disable-output -loop-interchange-prioritize-vectorization=1
+; RUN: FileCheck -input-file %t --check-prefix=PROFIT-VEC %s
+
+@A = dso_local global [256 x [256 x float]] zeroinitializer
+@B = dso_local global [256 x [256 x float]] zeroinitializer
+@C = dso_local global [256 x [256 x float]] zeroinitializer
+@D = dso_local global [256 x [256 x float]] zeroinitializer
+@E = dso_local global [256 x [256 x float]] zeroinitializer
+@F = dso_local global [256 x [256 x float]] zeroinitializer
+
+; Check the behavior of the LoopInterchange cost-model. In the below code,
+; exchanging the loops is not profitable in terms of cache, but it is necessary
+; to vectorize the innermost loop.
+;
+; for (int i = 0; i < 256; i++)
+;   for (int j = 1; j < 256; j++)
+;     A[j][i] = A[j-1][i] + B[j][i] + C[i][j] + D[i][j] + E[i][j] + F[i][j];
+;
+
+; PROFIT-CACHE:      --- !Missed
+; PROFIT-CACHE-NEXT: Pass:            loop-interchange
+; PROFIT-CACHE-NEXT: Name:            InterchangeNotProfitable
+; PROFIT-CACHE-NEXT: Function:        f
+; PROFIT-CACHE-NEXT: Args:
+; PROFIT-CACHE-NEXT:   - String:          Interchanging loops is not considered to improve cache locality nor vectorization.
+; PROFIT-CACHE-NEXT: ...
+
+; PROFIT-VEC:      --- !Passed
+; PROFIT-VEC-NEXT: Pass:            loop-interchange
+; PROFIT-VEC-NEXT: Name:            Interchanged
+; PROFIT-VEC-NEXT: Function:        f
+; PROFIT-VEC-NEXT: Args:
+; PROFIT-VEC-NEXT:   - String:          Loop interchanged with enclosing loop.
+; PROFIT-VEC-NEXT: ...
+define void @f() {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i64 [ 0, %entry ], [ %i.next, %for.i.inc ]
+  br label %for.j.body
+
+for.j.body:
+  %j = phi i64 [ 1, %for.i.header ], [ %j.next, %for.j.body ]
+  %j.dec = add nsw i64 %j, -1
+  %a.0.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @A, i64 %j.dec, i64 %i
+  %b.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @B, i64 %j, i64 %i
+  %c.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @C, i64 %i, i64 %j
+  %d.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @D, i64 %i, i64 %j
+  %e.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @E, i64 %i, i64 %j
+  %f.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @F, i64 %i, i64 %j
+  %a.0 = load float, ptr %a.0.index, align 4
+  %b = load float, ptr %b.index, align 4
+  %c = load float, ptr %c.index, align 4
+  %d = load float, ptr %d.index, align 4
+  %e = load float, ptr %e.index, align 4
+  %f = load float, ptr %f.index, align 4
+  %add.0 = fadd float %a.0, %b
+  %add.1 = fadd float %add.0, %c
+  %add.2 = fadd float %add.1, %d
+  %add.3 = fadd float %add.2, %e
+  %add.4 = fadd float %add.3, %f
+  %a.1.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @A, i64 %j, i64 %i
+  store float %add.4, ptr %a.1.index, align 4
+  %j.next = add nuw nsw i64 %j, 1
+  %cmp.j = icmp eq i64 %j.next, 256
+  br i1 %cmp.j, label %for.i.inc, label %for.j.body
+
+for.i.inc:
+  %i.next = add nuw nsw i64 %i, 1
+  %cmp.i = icmp eq i64 %i.next, 256
+  br i1 %cmp.i, label %exit, label %for.i.header
+
+exit:
+  ret void
+}

Meinersbur

Not requiring for this review, but have you considered exposing the generality of the implementation to the user? Something like -loop-interchange-priorities=cache,instcost,vector (One may also intentionally remove one of them)?

Meinersbur

LGTM. If you would like to address my suggestion, feel free to create another PR.

kasuga-fj · 2025-03-19T15:11:32Z

Not requiring for this review, but have you considered exposing the generality of the implementation to the user? Something like -loop-interchange-priorities=cache,instcost,vector (One may also intentionally remove one of them)?

I had thought of passing it as a constructor argument like LoopUnroll, but had not thought to add such an option. However, supporting it sounds good to me, especially when I want to disable some of them. Thanks for your idea, I'll add it.

kasuga-fj · 2025-03-19T15:15:11Z

Ah, the timing overlapped. So I'll create another PR later.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Co-authored-by: Florian Hahn <[email protected]>

fhahn · 2025-03-19T15:33:33Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

  }
+
  if (!shouldInterchange.has_value()) {


I think you will need to update various other places to avoid build failures?

I pressed the wrong button of GitHub...

sjoerdmeijer · 2025-03-19T15:46:18Z

Sorry for picking this up late, I don't have any problems with this patch and this looks good as a first step, but this is not ideal as it relies on a user option. I need to have a look again how this works, but at the risk of saying something silly, why is this not a "or"? So, if it is profitable for cache or vectorisation, do it?

Meinersbur · 2025-03-19T16:24:25Z

@sjoerdmeijer Each cost concern can also return "the order is optimal as-is; if it was in a different order I would suggest to change it". Lets say, vectorization prefers to swap, but cache analysis wants to keep. Since we are in a loop that finishes until a fixpoint is reached¹, we would alternate between vectorization wants to swap, then cache analysis swaps back, then vectorization wants to swap again, .... .

Changing the default cost model is difficult as it must not cause major regressions, or users will rightfully complain that their code got slower. Ideally, the compiler could determine whether e.g. optimization vectorization or cache pattern will enable the larger speedup. If you know an heuristic on how to determine that, we would be very interested.

Actually, a bubble sort; so theoretically there is a maximum number of iterations if the compare function is consistent. 'or' would not be a consistent compare function, just as [](auto lhs, auto rhs) { return lhs.first < rhs.first || lhs.second < rhs.second; } would not be a consistent compare function for tuples. ↩

sjoerdmeijer · 2025-03-19T20:57:36Z

Thanks for explaining, @Meinersbur , makes sense.

So maybe we need to review what is the better default, but that is for another day. Same for a heuristic. My instinct says vectorisation would be more beneficial, but maybe I am biased from looking more at compute bound problems.

kasuga-fj · 2025-03-21T07:42:22Z

Ideally, the compiler could determine whether e.g. optimization vectorization or cache pattern will enable the larger speedup. If you know an heuristic on how to determine that, we would be very interested.

I completely agree with this, and I don't have any idea at this moment.

My instinct says vectorisation would be more beneficial, but maybe I am biased from looking more at compute bound problems.

There are cases where artificial (maybe not realistic), less interesting but cache is more advantageous than vectorization, as follows.

// Swapping innermost two loops make it possible to vectorize the innermost one.
// However, it also has a negative impact on cache.
for (int nl = 0; nl < 100*(100000/256); nl++) {
  for (int i = 0; i < 256; ++i) {
    for (int j = 1; j < 256; j++) {
      aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j]
               + dd[i][j] + ee[i][j] + ff[i][j]
               + gg[i][j] + hh[i][j] + kk[i][j] + ...;  // More addition.
    }
  }
}

llvm-ci · 2025-03-21T08:16:24Z

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/14681

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: lang/cpp/stl/TestSTL.py (862 of 2109)
XFAIL: lldb-api :: lang/cpp/thread_local/TestThreadLocal.py (863 of 2109)
PASS: lldb-api :: lang/cpp/typedef-to-outer-fwd/TestTypedefToOuterFwd.py (864 of 2109)
PASS: lldb-api :: lang/cpp/type_lookup_duplicate/TestCppTypeLookupDuplicate.py (865 of 2109)
XFAIL: lldb-api :: lang/cpp/trivial_abi/TestTrivialABI.py (866 of 2109)
PASS: lldb-api :: lang/cpp/type_lookup/TestCppTypeLookup.py (867 of 2109)
PASS: lldb-api :: lang/cpp/typeof/TestTypeOfDeclTypeExpr.py (868 of 2109)
PASS: lldb-api :: lang/cpp/thunk/TestThunk.py (869 of 2109)
PASS: lldb-api :: lang/cpp/typedef/TestCppTypedef.py (870 of 2109)
PASS: lldb-api :: lang/cpp/template/TestTemplateArgs.py (871 of 2109)
FAIL: lldb-api :: lang/cpp/unique-types/TestUniqueTypes.py (872 of 2109)
******************** TEST 'lldb-api :: lang/cpp/unique-types/TestUniqueTypes.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/lang/cpp/unique-types -p TestUniqueTypes.py
--
Exit Code: -11

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 17b202fc17f2b4b1df3e8fc842226597a0ed364e)
  clang revision 17b202fc17f2b4b1df3e8fc842226597a0ed364e
  llvm revision 17b202fc17f2b4b1df3e8fc842226597a0ed364e
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_dsym (TestUniqueTypes.UniqueTypesTestCase) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_dwarf (TestUniqueTypes.UniqueTypesTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_dwo (TestUniqueTypes.UniqueTypesTestCase)
----------------------------------------------------------------------
Ran 3 tests in 0.969s

OK (skipped=1)

--

********************
PASS: lldb-api :: lang/cpp/unique-types2/TestUniqueTypes2.py (873 of 2109)
PASS: lldb-api :: lang/cpp/unsigned_types/TestUnsignedTypes.py (874 of 2109)
PASS: lldb-api :: lang/cpp/unique-types3/TestUniqueTypes3.py (875 of 2109)
PASS: lldb-api :: lang/cpp/unicode-literals/TestUnicodeLiterals.py (876 of 2109)
PASS: lldb-api :: lang/cpp/unique-types4/TestUniqueTypes4.py (877 of 2109)
PASS: lldb-api :: lang/cpp/union-static-data-members/TestCppUnionStaticMembers.py (878 of 2109)
PASS: lldb-api :: lang/cpp/virtual-overload/TestVirtualOverload.py (879 of 2109)
UNSUPPORTED: lldb-api :: lang/objc/bitfield_ivars/TestBitfieldIvars.py (880 of 2109)
UNSUPPORTED: lldb-api :: lang/objc/blocks/TestObjCIvarsInBlocks.py (881 of 2109)
UNSUPPORTED: lldb-api :: lang/objc/charstar_dyntype/TestCharStarDynType.py (882 of 2109)

llvmbot added the llvm:transforms label Mar 19, 2025

kasuga-fj requested review from Meinersbur, madhur13490, CongzheUalberta and sjoerdmeijer March 19, 2025 09:41

Meinersbur reviewed Mar 19, 2025

View reviewed changes

Meinersbur approved these changes Mar 19, 2025

View reviewed changes

fhahn reviewed Mar 19, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/LoopInterchange.cpp Show resolved Hide resolved

Update llvm/lib/Transforms/Scalar/LoopInterchange.cpp

c0b8ab0

Co-authored-by: Florian Hahn <[email protected]>

fhahn reviewed Mar 19, 2025

View reviewed changes

Fix variable name

4f81c8b

sjoerdmeijer approved these changes Mar 19, 2025

View reviewed changes

Undo the varianble name change

a01423b

kasuga-fj merged commit 17b202f into llvm:main Mar 21, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopInterchange] Add an option to prioritize vectorization #131988

[LoopInterchange] Add an option to prioritize vectorization #131988

Uh oh!

kasuga-fj commented Mar 19, 2025

Uh oh!

llvmbot commented Mar 19, 2025

Uh oh!

Meinersbur left a comment •

edited

Loading

Uh oh!

Meinersbur left a comment

Uh oh!

kasuga-fj commented Mar 19, 2025

Uh oh!

kasuga-fj commented Mar 19, 2025

Uh oh!

Uh oh!

fhahn Mar 19, 2025

Uh oh!

kasuga-fj Mar 19, 2025 •

edited

Loading

Uh oh!

sjoerdmeijer commented Mar 19, 2025

Uh oh!

Meinersbur commented Mar 19, 2025 •

edited

Loading

Uh oh!

sjoerdmeijer commented Mar 19, 2025

Uh oh!

kasuga-fj commented Mar 21, 2025

Uh oh!

Uh oh!

llvm-ci commented Mar 21, 2025

Uh oh!

Uh oh!

[LoopInterchange] Add an option to prioritize vectorization #131988

[LoopInterchange] Add an option to prioritize vectorization #131988

Uh oh!

Conversation

kasuga-fj commented Mar 19, 2025

Uh oh!

llvmbot commented Mar 19, 2025

Uh oh!

Meinersbur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Meinersbur left a comment

Choose a reason for hiding this comment

Uh oh!

kasuga-fj commented Mar 19, 2025

Uh oh!

kasuga-fj commented Mar 19, 2025

Uh oh!

Uh oh!

fhahn Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjoerdmeijer commented Mar 19, 2025

Uh oh!

Meinersbur commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

sjoerdmeijer commented Mar 19, 2025

Uh oh!

kasuga-fj commented Mar 21, 2025

Uh oh!

Uh oh!

llvm-ci commented Mar 21, 2025

Uh oh!

Uh oh!

Meinersbur left a comment •

edited

Loading

kasuga-fj Mar 19, 2025 •

edited

Loading

Meinersbur commented Mar 19, 2025 •

edited

Loading