-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][debug] Avoid redundant debug data generation for derived types. #124473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since llvm#122770, we have seen that compile time have become extremely slow for cyclic derived types. In 122770, we made the criteria to cache a type very strict. As a result, some types which are safe to cache were also being re-generated every type they were required. This increased the compile time and also the size of the debug info. Please see the description of PR# 122770. We decided that when processing t1, the type generated for t2 and t3 were not safe to cached. But our algorithm also denied caching to t1 which as top level type was safe. type t1 type(t2), pointer :: p1 end type type t2 type(t3), pointer :: p2 end type type t3 type(t1), pointer :: p3 end type I have tinkered the check a bit so that top level type is always cached. To detect a top level type, we use a depth counter that get incremented before call to convertRecordType and decremented after it returns. After this change, the following file from Fujitsu get compiled under a minute. https://github.com/fujitsu/compiler-test-suite/blob/main/Fortran/0394/0394_0031.f90 The smaller testcase present in issue 124049 takes around half second. I also added check to make sure that duplicate entries of the DICompositeType are not present in the IR. Fixes llvm#124049 and llvm#123960.
@llvm/pr-subscribers-flang-fir-hlfir Author: Abid Qadeer (abidh) ChangesSince #122770, we have seen that compile time have become extremely slow for cyclic derived types. In #122770, we made the criteria to cache a derived type very strict. As a result, some types which are safe to cache were also being re-generated every type they were required. This increased the compile time and also the size of the debug info. Please see the description of PR# 122770. We decided that when processing
I have tinkered the check a bit so that top level type is always cached. To detect a top level type, we use a depth counter that get incremented before call to After this change, the following file from Fujitsu get compiled around 40s which is same as it was before #122770. The smaller testcase present in issue #124049 takes less than half a second. I also added check to make sure that duplicate entries of the Fixes #124049 and #123960. Full diff: https://github.com/llvm/llvm-project/pull/124473.diff 3 Files Affected:
diff --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
index 555f354521c9b7..cdd30dce183dd5 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.cpp
@@ -48,7 +48,8 @@ DebugTypeGenerator::DebugTypeGenerator(mlir::ModuleOp m,
mlir::SymbolTable *symbolTable_,
const mlir::DataLayout &dl)
: module(m), symbolTable(symbolTable_), dataLayout{&dl},
- kindMapping(getKindMapping(m)), llvmTypeConverter(m, false, false, dl) {
+ kindMapping(getKindMapping(m)), llvmTypeConverter(m, false, false, dl),
+ derivedTypeDepth(0) {
LLVM_DEBUG(llvm::dbgs() << "DITypeAttr generator\n");
mlir::MLIRContext *context = module.getContext();
@@ -407,7 +408,10 @@ mlir::LLVM::DITypeAttr DebugTypeGenerator::convertRecordType(
/*baseType=*/nullptr, mlir::LLVM::DIFlags::Zero, offset * 8,
/*alignInBits=*/0, elements, /*dataLocation=*/nullptr, /*rank=*/nullptr,
/*allocated=*/nullptr, /*associated=*/nullptr);
- if (canCacheThisType) {
+
+ // derivedTypeDepth == 1 means that it is a top level type which is safe to
+ // cache.
+ if (canCacheThisType || derivedTypeDepth == 1) {
typeCache[Ty] = finalAttr;
} else {
auto iter = typeCache.find(Ty);
@@ -663,7 +667,27 @@ DebugTypeGenerator::convertType(mlir::Type Ty, mlir::LLVM::DIFileAttr fileAttr,
return convertCharacterType(charTy, fileAttr, scope, declOp,
/*hasDescriptor=*/false);
} else if (auto recTy = mlir::dyn_cast_if_present<fir::RecordType>(Ty)) {
- return convertRecordType(recTy, fileAttr, scope, declOp);
+ // For nested derived types like shown below, the call sequence of the
+ // convertRecordType will look something like as follows:
+ // convertRecordType (t1)
+ // convertRecordType (t2)
+ // convertRecordType (t3)
+ // We need to recognize when we are processing the top level type like t1
+ // to make caching decision. The variable `derivedTypeDepth` is used for
+ // this purpose and maintains the current depth of derived type processing.
+ // type t1
+ // type(t2), pointer :: p1
+ // end type
+ // type t2
+ // type(t3), pointer :: p2
+ // end type
+ // type t2
+ // integer a
+ // end type
+ derivedTypeDepth++;
+ auto result = convertRecordType(recTy, fileAttr, scope, declOp);
+ derivedTypeDepth--;
+ return result;
} else if (auto tupleTy = mlir::dyn_cast_if_present<mlir::TupleType>(Ty)) {
return convertTupleType(tupleTy, fileAttr, scope, declOp);
} else if (auto refTy = mlir::dyn_cast_if_present<fir::ReferenceType>(Ty)) {
diff --git a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
index 7daa0af166e697..cc4b5428ee1a9e 100644
--- a/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
+++ b/flang/lib/Optimizer/Transforms/DebugTypeGenerator.h
@@ -91,6 +91,7 @@ class DebugTypeGenerator {
std::uint64_t lenOffset;
std::uint64_t rankOffset;
std::uint64_t rankSize;
+ int32_t derivedTypeDepth;
llvm::DenseMap<mlir::Type, mlir::LLVM::DITypeAttr> typeCache;
};
diff --git a/flang/test/Integration/debug-cyclic-derived-type-3.f90 b/flang/test/Integration/debug-cyclic-derived-type-3.f90
index ef9aed13cc514c..d91c635576e26b 100644
--- a/flang/test/Integration/debug-cyclic-derived-type-3.f90
+++ b/flang/test/Integration/debug-cyclic-derived-type-3.f90
@@ -1,4 +1,4 @@
-! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o -
+! RUN: %flang_fc1 -emit-llvm -debug-info-kind=standalone %s -o - | FileCheck %s
! mainly test that this program does not cause an assertion failure
! testcase for issue 122024
@@ -17,7 +17,7 @@ module m1
program test
use m1
- type(t1),pointer :: foo
+ type(t1),pointer :: foo, foo2
allocate(foo)
allocate(foo%x1)
allocate(foo%x1%x2)
@@ -30,3 +30,7 @@ subroutine sub1(bar)
use m1
type(t2) :: bar
end subroutine
+
+! Test that file compiles ok and there is only one DICompositeType for "t1".
+!CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "t1"{{.*}})
+!CHECK-NOT: !DICompositeType(tag: DW_TAG_structure_type, name: "t1"{{.*}})
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, makes sense to me.
Since #122770, we have seen that compile time have become extremely slow for cyclic derived types. In #122770, we made the criteria to cache a derived type very strict. As a result, some types which are safe to cache were also being re-generated every type they were required. This increased the compile time and also the size of the debug info.
Please see the description of PR# 122770. We decided that when processing
t1
, the type generated fort2
andt3
were not safe to cached. But our algorithm also denied caching tot1
which as top level type was safe.I have tinkered the check a bit so that top level type is always cached. To detect a top level type, we use a depth counter that get incremented before call to
convertRecordType
and decremented after it returns.After this change, the following file from Fujitsu get compiled around 40s which is same as it was before #122770.
The smaller testcase present in issue #124049 takes less than half a second. I also added check to make sure that duplicate entries of the
DICompositeType
are not present in the IR.Fixes #124049 and #123960.