[Flang] Hoisting constant-sized allocas at flang codegen. #95310

VijayKandiah · 2024-06-12T20:52:03Z

This change modifies the AllocaOpConversion in flang codegen to insert constant-sized LLVM allocas at the entry block of LLVMFuncOp or OpenACC/OpenMP Op, rather than in-place at the fir.alloca. This effectively hoists constant-sized FIR allocas to the proper block.

When compiling the example subroutine below with flang-new, we get a llvm.stacksave/stackrestore pair around a constant-sized fir.alloca i32.

subroutine test(n)
    block
      integer :: n
      print *, n
    end block
  end subroutine test

Without the proposed change, downstream LLVM compilation cannot hoist this constant-sized alloca out of the stacksave/stackrestore region which may lead to missed downstream optimizations:

*** IR Dump After Safe Stack instrumentation pass (safe-stack) ***
define void @test_(ptr %0) !dbg !3 {
  %2 = call ptr @llvm.stacksave.p0(), !dbg !7
  %3 = alloca i32, i64 1, align 4, !dbg !8
  %4 = call ptr @_FortranAioBeginExternalListOutput(i32 6, ptr @_QQclX62c91d05f046c7a656e7978eb13f2821, i32 4), !dbg !9
  %5 = load i32, ptr %3, align 4, !dbg !10, !tbaa !11
  %6 = call i1 @_FortranAioOutputInteger32(ptr %4, i32 %5), !dbg !10
  %7 = call i32 @_FortranAioEndIoStatement(ptr %4), !dbg !9
  call void @llvm.stackrestore.p0(ptr %2), !dbg !15
  ret void, !dbg !16
}

With this change, the llvm.alloca is already hoisted out of the stacksave/stackrestore region during flang codegen:

// -----// IR Dump After FIRToLLVMLowering (fir-to-llvm-ir) //----- //
  llvm.func @test_(%arg0: !llvm.ptr {fir.bindc_name = "n"}) attributes {fir.internal_name = "_QPtest"} {
    %0 = llvm.mlir.constant(4 : i32) : i32
    %1 = llvm.mlir.constant(1 : i64) : i64
    %2 = llvm.alloca %1 x i32 {bindc_name = "n"} : (i64) -> !llvm.ptr
    %3 = llvm.mlir.constant(6 : i32) : i32
    %4 = llvm.mlir.undef : i1
    %5 = llvm.call @llvm.stacksave.p0() {fastmathFlags = #llvm.fastmath<contract>} : () -> !llvm.ptr
    %6 = llvm.mlir.addressof @_QQclX62c91d05f046c7a656e7978eb13f2821 : !llvm.ptr
    %7 = llvm.call @_FortranAioBeginExternalListOutput(%3, %6, %0) {fastmathFlags = #llvm.fastmath<contract>} : (i32, !llvm.ptr, i32) -> !llvm.ptr
    %8 = llvm.load %2 {tbaa = [#tbaa_tag]} : !llvm.ptr -> i32
    %9 = llvm.call @_FortranAioOutputInteger32(%7, %8) {fastmathFlags = #llvm.fastmath<contract>} : (!llvm.ptr, i32) -> i1
    %10 = llvm.call @_FortranAioEndIoStatement(%7) {fastmathFlags = #llvm.fastmath<contract>} : (!llvm.ptr) -> i32
    llvm.call @llvm.stackrestore.p0(%5) {fastmathFlags = #llvm.fastmath<contract>} : (!llvm.ptr) -> ()
    llvm.return
  }

llvmbot · 2024-06-12T20:52:37Z

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-flang-codegen

@llvm/pr-subscribers-flang-fir-hlfir

Author: Vijay Kandiah (VijayKandiah)

Changes

This change modifies the AllocaOpConversion in flang codegen to insert constant-sized LLVM allocas at the entry block of LLVMFuncOp or OpenACC/OpenMP Op, rather than in-place at the fir.alloca. This effectively hoists constant-sized FIR allocas to the proper block.

When compiling the example subroutine below with flang-new, we get a llvm.stacksave/stackrestore pair around a constant-sized fir.alloca i32.

subroutine test(n)
    block
      integer :: n
      print *, n
    end block
  end subroutine test

Without the proposed change, downstream LLVM compilation cannot hoist this constant-sized alloca out of the stacksave/stackrestore region which may lead to missed downstream optimizations:

*** IR Dump After Safe Stack instrumentation pass (safe-stack) ***
define void @<!-- -->test_(ptr %0) !dbg !3 {
  %2 = call ptr @<!-- -->llvm.stacksave.p0(), !dbg !7
  %3 = alloca i32, i64 1, align 4, !dbg !8
  %4 = call ptr @<!-- -->_FortranAioBeginExternalListOutput(i32 6, ptr @<!-- -->_QQclX62c91d05f046c7a656e7978eb13f2821, i32 4), !dbg !9
  %5 = load i32, ptr %3, align 4, !dbg !10, !tbaa !11
  %6 = call i1 @<!-- -->_FortranAioOutputInteger32(ptr %4, i32 %5), !dbg !10
  %7 = call i32 @<!-- -->_FortranAioEndIoStatement(ptr %4), !dbg !9
  call void @<!-- -->llvm.stackrestore.p0(ptr %2), !dbg !15
  ret void, !dbg !16
}

With this change, the llvm.alloca is already hoisted out of the stacksave/stackrestore region during flang codegen:

// -----// IR Dump After FIRToLLVMLowering (fir-to-llvm-ir) //----- //
  llvm.func @<!-- -->test_(%arg0: !llvm.ptr {fir.bindc_name = "n"}) attributes {fir.internal_name = "_QPtest"} {
    %0 = llvm.mlir.constant(4 : i32) : i32
    %1 = llvm.mlir.constant(1 : i64) : i64
    %2 = llvm.alloca %1 x i32 {bindc_name = "n"} : (i64) -&gt; !llvm.ptr
    %3 = llvm.mlir.constant(6 : i32) : i32
    %4 = llvm.mlir.undef : i1
    %5 = llvm.call @<!-- -->llvm.stacksave.p0() {fastmathFlags = #llvm.fastmath&lt;contract&gt;} : () -&gt; !llvm.ptr
    %6 = llvm.mlir.addressof @<!-- -->_QQclX62c91d05f046c7a656e7978eb13f2821 : !llvm.ptr
    %7 = llvm.call @<!-- -->_FortranAioBeginExternalListOutput(%3, %6, %0) {fastmathFlags = #llvm.fastmath&lt;contract&gt;} : (i32, !llvm.ptr, i32) -&gt; !llvm.ptr
    %8 = llvm.load %2 {tbaa = [#tbaa_tag]} : !llvm.ptr -&gt; i32
    %9 = llvm.call @<!-- -->_FortranAioOutputInteger32(%7, %8) {fastmathFlags = #llvm.fastmath&lt;contract&gt;} : (!llvm.ptr, i32) -&gt; i1
    %10 = llvm.call @<!-- -->_FortranAioEndIoStatement(%7) {fastmathFlags = #llvm.fastmath&lt;contract&gt;} : (!llvm.ptr) -&gt; i32
    llvm.call @<!-- -->llvm.stackrestore.p0(%5) {fastmathFlags = #llvm.fastmath&lt;contract&gt;} : (!llvm.ptr) -&gt; ()
    llvm.return
  }

Full diff: https://github.com/llvm/llvm-project/pull/95310.diff

3 Files Affected:

(modified) flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h (+7-4)
(modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+14-4)
(modified) flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp (+23-20)

diff --git a/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h b/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
index 211acdc8a38e6..6ace73e2d16af 100644
--- a/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
+++ b/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
@@ -51,7 +51,9 @@ class ConvertFIRToLLVMPattern : public mlir::ConvertToLLVMPattern {
   /// appropriate reified structures.
   mlir::Value integerCast(mlir::Location loc,
                           mlir::ConversionPatternRewriter &rewriter,
-                          mlir::Type ty, mlir::Value val) const;
+                          mlir::Type ty, mlir::Value val,
+                          bool fold = false) const;
+  
   struct TypePair {
     mlir::Type fir;
     mlir::Type llvm;
@@ -144,9 +146,10 @@ class ConvertFIRToLLVMPattern : public mlir::ConvertToLLVMPattern {
   // Find the Block in which the alloca should be inserted.
   // The order to recursively find the proper block:
   // 1. An OpenMP Op that will be outlined.
-  // 2. A LLVMFuncOp
-  // 3. The first ancestor that is an OpenMP Op or a LLVMFuncOp
-  mlir::Block *getBlockForAllocaInsert(mlir::Operation *op) const;
+  // 2. An OpenMP or OpenACC Op with one or more regions holding executable code.
+  // 3. A LLVMFuncOp
+  // 4. The first ancestor that is one of the above.
+  mlir::Block *getBlockForAllocaInsert(mlir::Operation *op, mlir::Region *parentRegion) const;
 
   // Generate an alloca of size 1 for an object of type \p llvmObjectTy in the
   // allocation address space provided for the architecture in the DataLayout
diff --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index 9f21c6b0cf097..d078a000ccd65 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -218,7 +218,7 @@ struct AllocaOpConversion : public fir::FIROpConversion<fir::AllocaOp> {
             chrTy.getContext(), chrTy.getFKind());
         llvmObjectType = convertType(rawCharTy);
         assert(end == 1);
-        size = integerCast(loc, rewriter, ity, lenParams[0]);
+        size = integerCast(loc, rewriter, ity, lenParams[0], /*fold=*/true);
       } else if (auto recTy = mlir::dyn_cast<fir::RecordType>(scalarType)) {
         mlir::LLVM::LLVMFuncOp memSizeFn =
             getDependentTypeMemSizeFn(recTy, alloc, rewriter);
@@ -236,17 +236,27 @@ struct AllocaOpConversion : public fir::FIROpConversion<fir::AllocaOp> {
       }
     }
     if (auto scaleSize = genAllocationScaleSize(alloc, ity, rewriter))
-      size = rewriter.create<mlir::LLVM::MulOp>(loc, ity, size, scaleSize);
+      size = rewriter.createOrFold<mlir::LLVM::MulOp>(loc, ity, size, scaleSize);
     if (alloc.hasShapeOperands()) {
       unsigned end = operands.size();
       for (; i < end; ++i)
-        size = rewriter.create<mlir::LLVM::MulOp>(
-            loc, ity, size, integerCast(loc, rewriter, ity, operands[i]));
+        size = rewriter.createOrFold<mlir::LLVM::MulOp>(
+            loc, ity, size,
+            integerCast(loc, rewriter, ity, operands[i], /*fold=*/true));
     }
 
     unsigned allocaAs = getAllocaAddressSpace(rewriter);
     unsigned programAs = getProgramAddressSpace(rewriter);
 
+    if (mlir::isa<mlir::LLVM::ConstantOp>(size.getDefiningOp())) {
+      // Set the Block in which the llvm alloca should be inserted.
+      mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp();
+      mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent();
+      mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp, parentRegion);
+      size.getDefiningOp()->moveAfter(insertBlock, insertBlock->begin());
+      rewriter.setInsertionPointAfter(size.getDefiningOp());
+    }
+
     // NOTE: we used to pass alloc->getAttrs() in the builder for non opaque
     // pointers! Only propagate pinned and bindc_name to help debugging, but
     // this should have no functional purpose (and passing the operand segment
diff --git a/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp b/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp
index 72e072db37432..6d86879cd3219 100644
--- a/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp
+++ b/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp
@@ -65,7 +65,7 @@ mlir::LLVM::ConstantOp ConvertFIRToLLVMPattern::genConstantOffset(
 mlir::Value
 ConvertFIRToLLVMPattern::integerCast(mlir::Location loc,
                                      mlir::ConversionPatternRewriter &rewriter,
-                                     mlir::Type ty, mlir::Value val) const {
+                                     mlir::Type ty, mlir::Value val, bool fold) const {
   auto valTy = val.getType();
   // If the value was not yet lowered, lower its type so that it can
   // be used in getPrimitiveTypeSizeInBits.
@@ -73,10 +73,17 @@ ConvertFIRToLLVMPattern::integerCast(mlir::Location loc,
     valTy = convertType(valTy);
   auto toSize = mlir::LLVM::getPrimitiveTypeSizeInBits(ty);
   auto fromSize = mlir::LLVM::getPrimitiveTypeSizeInBits(valTy);
-  if (toSize < fromSize)
-    return rewriter.create<mlir::LLVM::TruncOp>(loc, ty, val);
-  if (toSize > fromSize)
-    return rewriter.create<mlir::LLVM::SExtOp>(loc, ty, val);
+  if (fold) {
+    if (toSize < fromSize)
+      return rewriter.createOrFold<mlir::LLVM::TruncOp>(loc, ty, val);
+    if (toSize > fromSize)
+      return rewriter.createOrFold<mlir::LLVM::SExtOp>(loc, ty, val);
+  } else {
+    if (toSize < fromSize)
+      return rewriter.create<mlir::LLVM::TruncOp>(loc, ty, val);
+    if (toSize > fromSize)
+      return rewriter.create<mlir::LLVM::SExtOp>(loc, ty, val);
+  }
   return val;
 }
 
@@ -274,16 +281,19 @@ mlir::Value ConvertFIRToLLVMPattern::computeBoxSize(
 // Find the Block in which the alloca should be inserted.
 // The order to recursively find the proper block:
 // 1. An OpenMP Op that will be outlined.
-// 2. A LLVMFuncOp
-// 3. The first ancestor that is an OpenMP Op or a LLVMFuncOp
-mlir::Block *
-ConvertFIRToLLVMPattern::getBlockForAllocaInsert(mlir::Operation *op) const {
+// 2. An OpenMP or OpenACC Op with one or more regions holding executable code.
+// 3. A LLVMFuncOp
+// 4. The first ancestor that is one of the above.
+mlir::Block *ConvertFIRToLLVMPattern::getBlockForAllocaInsert(
+    mlir::Operation *op, mlir::Region *parentRegion) const {
   if (auto iface = mlir::dyn_cast<mlir::omp::OutlineableOpenMPOpInterface>(op))
     return iface.getAllocaBlock();
+  if (auto recipeIface = mlir::dyn_cast<mlir::accomp::RecipeInterface>(op))
+    return recipeIface.getAllocaBlock(*parentRegion);
   if (auto llvmFuncOp = mlir::dyn_cast<mlir::LLVM::LLVMFuncOp>(op))
     return &llvmFuncOp.front();
 
-  return getBlockForAllocaInsert(op->getParentOp());
+  return getBlockForAllocaInsert(op->getParentOp(), parentRegion);
 }
 
 // Generate an alloca of size 1 for an object of type \p llvmObjectTy in the
@@ -297,16 +307,9 @@ mlir::Value ConvertFIRToLLVMPattern::genAllocaAndAddrCastWithType(
     mlir::ConversionPatternRewriter &rewriter) const {
   auto thisPt = rewriter.saveInsertionPoint();
   mlir::Operation *parentOp = rewriter.getInsertionBlock()->getParentOp();
-  if (mlir::isa<mlir::omp::DeclareReductionOp>(parentOp) ||
-      mlir::isa<mlir::omp::PrivateClauseOp>(parentOp)) {
-    // DeclareReductionOp & PrivateClauseOp have multiple child regions. We want
-    // to get the first block of whichever of those regions we are currently in
-    mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent();
-    rewriter.setInsertionPointToStart(&parentRegion->front());
-  } else {
-    mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp);
-    rewriter.setInsertionPointToStart(insertBlock);
-  }
+  mlir::Region *parentRegion = rewriter.getInsertionBlock()->getParent();
+  mlir::Block *insertBlock = getBlockForAllocaInsert(parentOp, parentRegion);
+  rewriter.setInsertionPointToStart(insertBlock);
   auto size = genI32Constant(loc, rewriter, 1);
   unsigned allocaAs = getAllocaAddressSpace(rewriter);
   unsigned programAs = getProgramAddressSpace(rewriter);

github-actions · 2024-06-12T20:54:57Z

✅ With the latest revision this PR passed the C/C++ code formatter.

clementval · 2024-06-12T20:56:10Z

Are you also hoisting descriptors? Something like:

%0 = fir.alloca !fir.box<!fir.ptr<f32>> {bindc_name = "p1", uniq_name = "_QFtestEp1"}

VijayKandiah · 2024-06-12T21:28:10Z

Yes, I am hoisting descriptors too. Their size.getDefiningOp() is a mlir::llvm:ConstantOp and this change hoists these allocas too.

vzakhari

LGTM

clementval · 2024-06-13T05:16:07Z

You have couple of lit tests failure on the pre-commit CI.

tblah

LGTM once CI passes

vzakhari · 2024-06-13T18:59:30Z

flang/test/Fir/convert-to-llvm-openmp-and-fir.fir

+ // CHECK:    %[[VAL_2:.*]] = llvm.mlir.constant(1 : i64) : i64
+ // CHECK:    %[[VAL_3:.*]] = llvm.alloca %[[VAL_2]] x !llvm.array<1024 x i32> {bindc_name = "b"} : (i64) -> !llvm.ptr
+ // CHECK:    %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
+ // CHECK:    %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x !llvm.array<1024 x i32> {bindc_name = "a"} : (i64) -> !llvm.ptr
 // CHECK:    %9 = llvm.mlir.constant(1024 : index) : i64


Not your change, but I suggest regenerating these checks from scratch to get rid of the numbered SSA values in the checks.

Got it, I regenerated the checks from scratch for this function.

vzakhari · 2024-06-13T19:37:39Z

Thank you, Vijay!

[Flang] Hoisting constant-sized allocas at flang codegen.

b9d36f3

VijayKandiah requested review from clementval, tblah and vzakhari June 12, 2024 20:52

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Jun 12, 2024

[flang] code-formatting fixes for hoisting alloca pass.

52e0a94

vzakhari approved these changes Jun 13, 2024

View reviewed changes

tblah approved these changes Jun 13, 2024

View reviewed changes

VijayKandiah and others added 2 commits June 13, 2024 13:48

Merge branch 'llvm:main' into main

3c075d7

[flang] lit test fixes for hoisting alloca change.

457b34f

llvmbot added the flang:openmp label Jun 13, 2024

vzakhari reviewed Jun 13, 2024

View reviewed changes

[flang] fixing lit test convert-to-llvm-openmp-and-fir.fir

b851f58

VijayKandiah merged commit c0cba51 into llvm:main Jun 14, 2024
7 checks passed

jpienaar mentioned this pull request Jun 17, 2024

[mlirc] Add missing extern C #95829

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flang] Hoisting constant-sized allocas at flang codegen. #95310

[Flang] Hoisting constant-sized allocas at flang codegen. #95310

Uh oh!

VijayKandiah commented Jun 12, 2024

Uh oh!

llvmbot commented Jun 12, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jun 12, 2024 •

edited

Loading

Uh oh!

clementval commented Jun 12, 2024

Uh oh!

VijayKandiah commented Jun 12, 2024

Uh oh!

vzakhari left a comment

Uh oh!

clementval commented Jun 13, 2024

Uh oh!

tblah left a comment

Uh oh!

vzakhari Jun 13, 2024

Uh oh!

VijayKandiah Jun 13, 2024

Uh oh!

vzakhari commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

[Flang] Hoisting constant-sized allocas at flang codegen. #95310

[Flang] Hoisting constant-sized allocas at flang codegen. #95310

Uh oh!

Conversation

VijayKandiah commented Jun 12, 2024

Uh oh!

llvmbot commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clementval commented Jun 12, 2024

Uh oh!

VijayKandiah commented Jun 12, 2024

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

clementval commented Jun 13, 2024

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

VijayKandiah Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

vzakhari commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jun 12, 2024 •

edited

Loading

github-actions bot commented Jun 12, 2024 •

edited

Loading