Skip to content

[flang][debug] Avoid redundant debug data generation for derived types. #124473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

abidh
Copy link
Contributor

@abidh abidh commented Jan 26, 2025

Since #122770, we have seen that compile time have become extremely slow for cyclic derived types. In #122770, we made the criteria to cache a derived type very strict. As a result, some types which are safe to cache were also being re-generated every type they were required. This increased the compile time and also the size of the debug info.

Please see the description of PR# 122770. We decided that when processing t1, the type generated for t2 and t3 were not safe to cached. But our algorithm also denied caching to t1 which as top level type was safe.

type t1
  type(t2), pointer :: p1
end type
type t2
  type(t3), pointer :: p2
end type
type t3
  type(t1), pointer :: p3
end type

I have tinkered the check a bit so that top level type is always cached. To detect a top level type, we use a depth counter that get incremented before call to convertRecordType and decremented after it returns.

After this change, the following file from Fujitsu get compiled around 40s which is same as it was before #122770.

The smaller testcase present in issue #124049 takes less than half a second. I also added check to make sure that duplicate entries of the DICompositeType are not present in the IR.

Fixes #124049 and #123960.

Since llvm#122770, we have seen
that compile time have become extremely slow for cyclic derived types.
In 122770, we made the criteria to cache a type very strict. As a result,
some types which are safe to cache were also being re-generated every
type they were required. This increased the compile time and also the
size of the debug info.

Please see the description of PR# 122770. We decided that when
processing t1, the type generated for t2 and t3 were not safe to
cached. But our algorithm also denied caching to t1 which as top level
type was safe.

type t1
  type(t2), pointer :: p1
end type
type t2
  type(t3), pointer :: p2
end type
type t3
  type(t1), pointer :: p3
end type

I have tinkered the check a bit so that top level type is always cached.
To detect a top level type, we use a depth counter that get incremented
before call to convertRecordType and decremented after it returns.

After this change, the following file from Fujitsu get compiled under
a minute.
https://github.com/fujitsu/compiler-test-suite/blob/main/Fortran/0394/0394_0031.f90

The smaller testcase present in issue 124049 takes around half second. I
also added check to make sure that duplicate entries of the
DICompositeType are not present in the IR.

Fixes llvm#124049 and llvm#123960.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jan 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 26, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Abid Qadeer (abidh)

Changes

Since #122770, we have seen that compile time have become extremely slow for cyclic derived types. In #122770, we made the criteria to cache a derived type very strict. As a result, some types which are safe to cache were also being re-generated every type they were required. This increased the compile time and also the size of the debug info.

Please see the description of PR# 122770. We decided that when processing t1, the type generated for t2 and t3 were not safe to cached. But our algorithm also denied caching to t1 which as top level type was safe.

type t1
  type(t2), pointer :: p1
end type
type t2
  type(t3), pointer :: p2
end type
type t3
  type(t1), pointer :: p3
end type

I have tinkered the check a bit so that top level type is always cached. To detect a top level type, we use a depth counter that get incremented before call to convertRecordType and decremented after it returns.

After this change, the following file from Fujitsu get compiled around 40s which is same as it was before #122770.

The smaller testcase present in issue #124049 takes less than half a second. I also added check to make sure that duplicate entries of the DICompositeType are not present in the IR.

Fixes #124049 and #123960.


Full diff: https://github.com/llvm/llvm-project/pull/124473.diff

3 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp (+27-3)
  • (modified) flang/lib/Optimizer/Transforms/DebugTypeGenerator.h (+1)
  • (modified) flang/test/Integration/debug-cyclic-derived-type-3.f90 (+6-2)
diff --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
index 555f354521c9b7..cdd30dce183dd5 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
@@ -48,7 +48,8 @@ DebugTypeGenerator::DebugTypeGenerator(mlir::ModuleOp m,
                                        mlir::SymbolTable *symbolTable_,
                                        const mlir::DataLayout &dl)
     : module(m), symbolTable(symbolTable_), dataLayout{&dl},
-      kindMapping(getKindMapping(m)), llvmTypeConverter(m, false, false, dl) {
+      kindMapping(getKindMapping(m)), llvmTypeConverter(m, false, false, dl),
+      derivedTypeDepth(0) {
   LLVM_DEBUG(llvm::dbgs() << "DITypeAttr generator\n");
 
   mlir::MLIRContext *context = module.getContext();
@@ -407,7 +408,10 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
       /*baseType=*/nullptr, mlir::LLVM::DIFlags::Zero, offset * 8,
       /*alignInBits=*/0, elements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
       /*allocated=*/nullptr, /*associated=*/nullptr);
-  if (canCacheThisType) {
+
+  // derivedTypeDepth == 1 means that it is a top level type which is safe to
+  // cache.
+  if (canCacheThisType || derivedTypeDepth == 1) {
     typeCache[Ty] = finalAttr;
   } else {
     auto iter = typeCache.find(Ty);
@@ -663,7 +667,27 @@ DebugTypeGenerator::convertType(mlir::Type Ty, mlir::LLVM::DIFileAttr fileAttr,
     return convertCharacterType(charTy, fileAttr, scope, declOp,
                                 /*hasDescriptor=*/false);
   } else if (auto recTy = mlir::dyn_cast_if_present<fir::RecordType>(Ty)) {
-    return convertRecordType(recTy, fileAttr, scope, declOp);
+    // For nested derived types like shown below, the call sequence of the
+    // convertRecordType will look something like as follows:
+    // convertRecordType (t1)
+    //  convertRecordType (t2)
+    //    convertRecordType (t3)
+    // We need to recognize when we are processing the top level type like t1
+    // to make caching decision. The variable `derivedTypeDepth` is used for
+    // this purpose and maintains the current depth of derived type processing.
+    //  type t1
+    //   type(t2), pointer :: p1
+    // end type
+    // type t2
+    //   type(t3), pointer :: p2
+    // end type
+    // type t2
+    //   integer a
+    // end type
+    derivedTypeDepth++;
+    auto result = convertRecordType(recTy, fileAttr, scope, declOp);
+    derivedTypeDepth--;
+    return result;
   } else if (auto tupleTy = mlir::dyn_cast_if_present<mlir::TupleType>(Ty)) {
     return convertTupleType(tupleTy, fileAttr, scope, declOp);
   } else if (auto refTy = mlir::dyn_cast_if_present<fir::ReferenceType>(Ty)) {
diff --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
index 7daa0af166e697..cc4b5428ee1a9e 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
@@ -91,6 +91,7 @@ class DebugTypeGenerator {
   std::uint64_t lenOffset;
   std::uint64_t rankOffset;
   std::uint64_t rankSize;
+  int32_t derivedTypeDepth;
   llvm::DenseMap<mlir::Type, mlir::LLVM::DITypeAttr> typeCache;
 };
 
diff --git a/flang/test/Integration/debug-cyclic-derived-type-3.f90 b/flang/test/Integration/debug-cyclic-derived-type-3.f90
index ef9aed13cc514c..d91c635576e26b 100644
--- a/flang/test/Integration/debug-cyclic-derived-type-3.f90
+++ b/flang/test/Integration/debug-cyclic-derived-type-3.f90
@@ -1,4 +1,4 @@
-! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o -
+! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o - | FileCheck %s
 
 ! mainly test that this program does not cause an assertion failure
 ! testcase for issue 122024
@@ -17,7 +17,7 @@ module m1
 
 program test
   use m1
-  type(t1),pointer :: foo
+  type(t1),pointer :: foo, foo2
   allocate(foo)
   allocate(foo%x1)
   allocate(foo%x1%x2)
@@ -30,3 +30,7 @@ subroutine sub1(bar)
   use m1
   type(t2) :: bar
 end subroutine
+
+! Test that file compiles ok and there is only one DICompositeType for "t1".
+!CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "t1"{{.*}})
+!CHECK-NOT: !DICompositeType(tag: DW_TAG_structure_type, name: "t1"{{.*}})

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, makes sense to me.

@abidh abidh merged commit 2e5a523 into llvm:main Jan 27, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fortran test with many self-referencing types takes "forever" to compile with -g
3 participants