-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DebugInfo] Merge partially matching chains of textual inclusions #125780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes llvm#122846. The cause of this issue was that GetNearestCommonScope within getMergedLocation, which used to find a common scope for a matching locations pair, ignored DILexicalBlockFile while walking up the parent chain of incoming DILocations. Therefore, getMergedLocation could attach DILocation with line number from one file to the scope from another file. Whereas in the reproducer from the issue, both incoming locations have the same line and column numbers, and they can be just attached to one of the DILexicalBlockFile's (which represent the same file and have the same scope but are not distinct). To fix that issue and to be able to merge locations from different included files, 1. inlinedAt chains processing code from 12a7aea was generalized so that the same algorithm can be used for processing inlinedAt chains and nested DILexicalBlocks with changing file attribute chains, as they both can be considered as textual inclusions (as proposed in llvm#122846). 2. getMergedLocations tries to merge chains of inlined locations. Here, for each matching pair of locations from inlinedAt chains, chains of their parent scopes are traversed. Only scopes within the same DISubprogram are considered (as all pairs of locations from inlinedAt chains can only be merged if they belong to the same subprogram). Also, only those parent scopes are considered, that have a different file attribute value from the children scope, as they represent textual inclusions. Such parent scope chains from incoming locations LocA and LocB are processed in the way similar to 12a7aea, and matching pairs from textual inclusion chains are being merged. The valid merge location for the most local to LocA and LocB matching pair from parent scope chains is taken as getMergedLocation result.
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-llvm-ir Author: Vladislav Dzhidzhoev (dzhidzhoev) ChangesFixes #122846. The cause of this issue was that GetNearestCommonScope within getMergedLocation, which used to find a common scope for a matching locations pair, ignored DILexicalBlockFile while walking up the parent chain of incoming DILocations. Therefore, getMergedLocation could attach DILocation with line number from one file to the scope from another file. Whereas in the reproducer from the issue, both incoming locations have the same line and column numbers, and they can be just attached to one of the DILexicalBlockFile's (which represent the same file and have the same scope but are not distinct). To fix that issue and to be able to merge locations from different included files,
Patch is 30.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125780.diff 3 Files Affected:
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index 915cdd301f2c7e..1514455c609160 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -118,30 +118,40 @@ DILocation *DILocation::getMergedLocations(ArrayRef<DILocation *> Locs) {
return Merged;
}
-DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
- if (!LocA || !LocB)
- return nullptr;
-
- if (LocA == LocB)
- return LocA;
-
- LLVMContext &C = LocA->getContext();
-
- using LocVec = SmallVector<const DILocation *>;
+/// Merge two locations that may be nested as textual inclusions,
+/// e.g. via inlinedAt or nested LexicalBlocks.
+///
+/// \param GetLocationKey Function to determine whether locations/scopes are considered matching.
+/// \param ShouldStop Function that determines if include chain of a location has ended.
+/// \param NextLoc Should return next location/object in include chain.
+/// \param MergeLocPair Function to merge two possibly matching locations from
+/// include chain.
+/// \param DefaultLocation Function that returns location of the first matching nested object.
+/// \param LocA First incoming location.
+/// \param LocA Second incoming location.
+template <typename LocationKey, typename GetLocationKeyFn,
+ typename ShouldStopFn, typename NextLocFn, typename MergeLocPairFn,
+ typename DefaultLocationFn>
+static DILocation *
+MergeNestedLocations(GetLocationKeyFn GetLocationKey, ShouldStopFn ShouldStop,
+ NextLocFn NextLoc, MergeLocPairFn MergeLocPair,
+ DefaultLocationFn DefaultLocation, DILocation *LocA,
+ DILocation *LocB) {
+ using LocVec = SmallVector<MDNode *>;
LocVec ALocs;
LocVec BLocs;
- SmallDenseMap<std::pair<const DISubprogram *, const DILocation *>, unsigned,
- 4>
- ALookup;
- // Walk through LocA and its inlined-at locations, populate them in ALocs and
- // save the index for the subprogram and inlined-at pair, which we use to find
+ LLVMContext &C = LocA->getContext();
+
+ SmallDenseMap<LocationKey, unsigned, 4> ALookup;
+ // Walk through LocA, populate them in ALocs and
+ // save the index, which we use to find
// a matching starting location in LocB's chain.
- for (auto [L, I] = std::make_pair(LocA, 0U); L; L = L->getInlinedAt(), I++) {
+ using iter_t = std::pair<MDNode *, unsigned>;
+ for (auto [L, I] = iter_t(LocA, 0U); ShouldStop(L); L = NextLoc(L), I++) {
ALocs.push_back(L);
- auto Res = ALookup.try_emplace(
- {L->getScope()->getSubprogram(), L->getInlinedAt()}, I);
- assert(Res.second && "Multiple <SP, InlinedAt> pairs in a location chain?");
+ auto Res = ALookup.try_emplace(GetLocationKey(L), I);
+ assert(Res.second && "Multiple pairs in a location chain?");
(void)Res;
}
@@ -149,17 +159,17 @@ DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
LocVec::reverse_iterator BRIt = BLocs.rend();
// Populate BLocs and look for a matching starting location, the first
- // location with the same subprogram and inlined-at location as in LocA's
- // chain. Since the two locations have the same inlined-at location we do
+ // location with the same key as in LocA's
+ // chain. Since the two locations have the same key we do
// not need to look at those parts of the chains.
- for (auto [L, I] = std::make_pair(LocB, 0U); L; L = L->getInlinedAt(), I++) {
+ for (auto [L, I] = iter_t(LocB, 0U); ShouldStop(L); L = NextLoc(L), I++) {
BLocs.push_back(L);
if (ARIt != ALocs.rend())
// We have already found a matching starting location.
continue;
- auto IT = ALookup.find({L->getScope()->getSubprogram(), L->getInlinedAt()});
+ auto IT = ALookup.find(GetLocationKey(L));
if (IT == ALookup.end())
continue;
@@ -174,10 +184,131 @@ DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
break;
}
+ DILocation *Result = ARIt != ALocs.rend() ? DefaultLocation(*ARIt) : nullptr;
+
+ // If we have found a common starting location, walk up the chains
+ // and try to produce common locations.
+ for (; ARIt != ALocs.rend() && BRIt != BLocs.rend(); ++ARIt, ++BRIt) {
+ DILocation *Tmp = MergeLocPair(*ARIt, *BRIt, Result);
+
+ if (!Tmp)
+ // We have walked up to a point in te chains where the two locations
+ // are irreconsilable. At this point Result contains the nearest common
+ // location in the location chains of LocA and LocB, so we break here.
+ break;
+
+ Result = Tmp;
+ }
+
+ if (auto *ResultL = dyn_cast_if_present<DILocation>(Result))
+ return ResultL;
+
+ // We ended up with LocA and LocB as irreconsilable locations. Produce a
+ // location at 0:0 with one of the locations' scope.
+ return DILocation::get(C, 0, 0, LocA->getScope(), nullptr);
+}
+
+/// Returns the nearest common scope for two locations inside a subprogram.
+static DIScope *GetNearestCommonScope(DILocation *L1, DILocation *L2) {
+ DIScope *S1 = L1->getScope();
+ DIScope *S2 = L2->getScope();
+
+ // Try matching DILexicalBlockFiles enclosing L1, L2.
+ if (auto *LBF1 = dyn_cast<DILexicalBlockFile>(S1)) {
+ if (auto *LBF2 = dyn_cast<DILexicalBlockFile>(S2)) {
+ if (LBF1->getFile() && LBF1->getFile() == LBF2->getFile() &&
+ LBF1->getDiscriminator() == LBF2->getDiscriminator()) {
+ return LBF1;
+ }
+ }
+ }
+
+ SmallPtrSet<DIScope *, 8> Scopes;
+ for (; S1; S1 = S1->getScope()) {
+ Scopes.insert(S1);
+ if (isa<DISubprogram>(S1))
+ break;
+ }
+
+ for (; S2; S2 = S2->getScope()) {
+ if (Scopes.count(S2))
+ return S2;
+ if (isa<DISubprogram>(S2))
+ break;
+ }
+
+ return nullptr;
+}
+
+/// Walk up the chain of parent local scopes until the file has changed
+/// or DISubprogram reached.
+static DILocalScope *NextScopeWithDifferentFile(MDNode *LocOrScope) {
+ assert((isa<DILocation>(LocOrScope) || isa<DILocalScope>(LocOrScope)) &&
+ "DILocation or DILocalScope expected");
+
+ DIScope *S = nullptr;
+ DIFile *F = nullptr;
+ if (auto *Loc = dyn_cast<DILocation>(LocOrScope)) {
+ S = Loc->getScope();
+ F = Loc->getFile();
+ } else {
+ auto *LS = dyn_cast<DILocalScope>(LocOrScope);
+ S = LS->getScope();
+ F = LS->getFile();
+ }
+
+ while (isa_and_present<DILocalScope>(S) && !isa<DISubprogram>(S)) {
+ if (auto *LB = dyn_cast_if_present<DILexicalBlock>(S)) {
+ if (F != LB->getFile()) {
+ return LB;
+ }
+ }
+ S = S->getScope();
+ }
+
+ if (auto *SP = dyn_cast_or_null<DISubprogram>(S)) {
+ return SP;
+ }
+
+ return nullptr;
+}
+
+/// Get DILocation and it's DIFile from either DILocation, DISubprogram or
+/// DILexicalBlock.
+static std::pair<DILocation *, DIFile *>
+GetLocAndFile(LLVMContext &C, MDNode *N, DILocation *InlinedAt) {
+ DILocation *Loc = nullptr;
+ DIFile *File = nullptr;
+ if (auto *InputLoc = dyn_cast<DILocation>(N)) {
+ Loc = InputLoc;
+ File = InputLoc->getFile();
+ } else if (auto *LB = dyn_cast<DILexicalBlock>(N)) {
+ Loc = DILocation::get(C, LB->getLine(), LB->getColumn(), LB, InlinedAt);
+ File = LB->getFile();
+ } else if (auto *SP = dyn_cast<DISubprogram>(N)) {
+ Loc = DILocation::get(C, SP->getLine(), 0, SP, InlinedAt);
+ File = SP->getFile();
+ }
+
+ return std::make_pair(Loc, File);
+}
+
+DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
+ if (!LocA || !LocB)
+ return nullptr;
+
+ if (LocA == LocB)
+ return LocA;
+
+ LLVMContext &C = LocA->getContext();
+
// Merge the two locations if possible, using the supplied
// inlined-at location for the created location.
- auto MergeLocPair = [&C](const DILocation *L1, const DILocation *L2,
- DILocation *InlinedAt) -> DILocation * {
+ auto MergeInlinedLocations = [&C](MDNode *N1, MDNode *N2,
+ DILocation *InlinedAt) -> DILocation * {
+ auto *L1 = cast<DILocation>(N1);
+ auto *L2 = cast<DILocation>(N2);
+
if (L1 == L2)
return DILocation::get(C, L1->getLine(), L1->getColumn(), L1->getScope(),
InlinedAt);
@@ -187,61 +318,89 @@ DILocation *DILocation::getMergedLocation(DILocation *LocA, DILocation *LocB) {
if (L1->getScope()->getSubprogram() != L2->getScope()->getSubprogram())
return nullptr;
- // Return the nearest common scope inside a subprogram.
- auto GetNearestCommonScope = [](DIScope *S1, DIScope *S2) -> DIScope * {
- SmallPtrSet<DIScope *, 8> Scopes;
- for (; S1; S1 = S1->getScope()) {
- Scopes.insert(S1);
- if (isa<DISubprogram>(S1))
- break;
- }
-
- for (; S2; S2 = S2->getScope()) {
- if (Scopes.count(S2))
- return S2;
- if (isa<DISubprogram>(S2))
- break;
+ // We should traverse DILexicalBlock's that enclose input locations and
+ // have different file attribute.
+ using NestedLexicalBlocksMatchKey = MDNode *;
+ // We try to match equal scopes or locations belonging to the same file.
+ auto GetLocOrScopeMatchKey = [](MDNode *N) -> NestedLexicalBlocksMatchKey {
+ if (auto *Loc = dyn_cast_if_present<DILocation>(N)) {
+ return Loc->getFile();
+ } else if (auto *LS = dyn_cast_if_present<DILocalScope>(N)) {
+ return LS;
}
return nullptr;
};
- auto Scope = GetNearestCommonScope(L1->getScope(), L2->getScope());
- assert(Scope && "No common scope in the same subprogram?");
-
- bool SameLine = L1->getLine() == L2->getLine();
- bool SameCol = L1->getColumn() == L2->getColumn();
- unsigned Line = SameLine ? L1->getLine() : 0;
- unsigned Col = SameLine && SameCol ? L1->getColumn() : 0;
-
- return DILocation::get(C, Line, Col, Scope, InlinedAt);
- };
+ auto MergeTextualInclusions =
+ [&C, InlinedAt](MDNode *LocOrScope1, MDNode *LocOrScope2,
+ DILocation *PrevLoc) -> DILocation * {
+ assert((isa_and_present<DILocation>(LocOrScope1) ||
+ isa_and_present<DILocalScope>(LocOrScope1)) &&
+ "Incorrect nested location type");
+ assert((isa_and_present<DILocation>(LocOrScope2) ||
+ isa_and_present<DILocalScope>(LocOrScope2)) &&
+ "Incorrect nested location type");
+
+ auto [L1, F1] = GetLocAndFile(C, LocOrScope1, InlinedAt);
+ auto [L2, F2] = GetLocAndFile(C, LocOrScope2, InlinedAt);
+
+ assert(L1->getScope()->getSubprogram() ==
+ L2->getScope()->getSubprogram() &&
+ "Locations from different subprograms?");
+
+ if (L1 == L2) {
+ auto *Result = DILocation::get(C, L1->getLine(), L1->getColumn(),
+ L1->getScope(), InlinedAt);
+ return Result;
+ }
- DILocation *Result = ARIt != ALocs.rend() ? (*ARIt)->getInlinedAt() : nullptr;
+ DIScope *Scope = GetNearestCommonScope(L1, L2);
+ assert(Scope && "No common scope in the same subprogram?");
- // If we have found a common starting location, walk up the inlined-at chains
- // and try to produce common locations.
- for (; ARIt != ALocs.rend() && BRIt != BLocs.rend(); ++ARIt, ++BRIt) {
- DILocation *Tmp = MergeLocPair(*ARIt, *BRIt, Result);
+ // If the common scope from the same file as incoming locations
+ // can't be found, try another pair of locations.
+ if (Scope->getFile() != F1 || Scope->getFile() != F2) {
+ return nullptr;
+ }
- if (!Tmp)
- // We have walked up to a point in the chains where the two locations
- // are irreconsilable. At this point Result contains the nearest common
- // location in the inlined-at chains of LocA and LocB, so we break here.
- break;
+ bool SameLine = L1->getLine() == L2->getLine();
+ bool SameCol = L1->getColumn() == L2->getColumn();
+ unsigned Line = SameLine ? L1->getLine() : 0;
+ unsigned Col = SameLine && SameCol ? L1->getColumn() : 0;
- Result = Tmp;
- }
+ auto *Result = DILocation::get(C, Line, Col, Scope, InlinedAt);
+ return Result;
+ };
- if (Result)
- return Result;
+ return MergeNestedLocations<NestedLexicalBlocksMatchKey>(
+ GetLocOrScopeMatchKey,
+ [](MDNode *L) -> bool { return L; },
+ NextScopeWithDifferentFile,
+ MergeTextualInclusions,
+ [](MDNode *N) { return nullptr; },
+ L1, L2);
+ };
- // We ended up with LocA and LocB as irreconsilable locations. Produce a
- // location at 0:0 with one of the locations' scope. The function has
- // historically picked A's scope, and a nullptr inlined-at location, so that
- // behavior is mimicked here but I am not sure if this is always the correct
- // way to handle this.
- return DILocation::get(C, 0, 0, LocA->getScope(), nullptr);
+ // When traversing inlinedAt chain, we consider locations within the same
+ // subprogram and inlinedAt location matching.
+ using InlinedAtLookupKey = std::pair<DISubprogram *, DILocation *>;
+ auto GetInlinedAtKey = [](MDNode *N) {
+ auto *L = cast<DILocation>(N);
+ return InlinedAtLookupKey(L->getScope()->getSubprogram(),
+ L->getInlinedAt());
+ };
+ auto NextInlinedAt = [](MDNode *N) {
+ auto *L = cast<DILocation>(N);
+ return L->getInlinedAt();
+ };
+ return MergeNestedLocations<InlinedAtLookupKey>(
+ GetInlinedAtKey,
+ [](MDNode *N) { return N; },
+ NextInlinedAt,
+ MergeInlinedLocations,
+ [GetInlinedAtKey](MDNode *N) { return GetInlinedAtKey(N).second; },
+ LocA, LocB);
}
std::optional<unsigned>
diff --git a/llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll b/llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll
new file mode 100644
index 00000000000000..ed087c1e901660
--- /dev/null
+++ b/llvm/test/DebugInfo/AArch64/merge-nested-block-loc.ll
@@ -0,0 +1,204 @@
+; RUN: opt -mtriple=aarch64-unknown-linux-gnu -S %s -passes=sroa -o - | FileCheck %s
+
+; In this test we want to ensure that the location of phi instruction merged from locations
+; of %mul3 and %mul9 belongs to the correct scope (DILexicalBlockFile), so that line
+; number of that location belongs to the corresponding file.
+
+; Generated with clang from
+; 1 # 1 "1.c" 1
+; 2 # 1 "1.c" 2
+; 3 int foo(int a) {
+; 4 int i = 0;
+; 5 if ((a & 1) == 1) {
+; 6 a -= 1;
+; 7 # 1 "m.c" 1
+; 8 # 40 "m.c"
+; 9 i += a;
+; 10 i -= 10*a;
+; 11 i *= a*a;
+; 12 # 6 "1.c" 2
+; 13 } else {
+; 14 a += 3;
+; 15 # 1 "m.c" 1
+; 16 # 40 "m.c"
+; 17 i += a;
+; 18 i -= 10*a;
+; 19 i *= a*a;
+; 20 # 9 "1.c" 2
+; 21 }
+; 22 return i;
+; 23 }
+
+; ModuleID = 'repro.c'
+source_filename = "repro.c"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "aarch64-unknown-linux-gnu"
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @foo(i32 noundef %a) #0 !dbg !9 {
+; CHECK: phi i32 {{.*}}, !dbg [[PHILOC:![0-9]+]]
+;
+entry:
+ %a.addr = alloca i32, align 4, !DIAssignID !17
+ #dbg_assign(i1 undef, !15, !DIExpression(), !17, ptr %a.addr, !DIExpression(), !18)
+ %i = alloca i32, align 4, !DIAssignID !19
+ #dbg_assign(i1 undef, !16, !DIExpression(), !19, ptr %i, !DIExpression(), !18)
+ store i32 %a, ptr %a.addr, align 4, !tbaa !20, !DIAssignID !24
+ #dbg_assign(i32 %a, !15, !DIExpression(), !24, ptr %a.addr, !DIExpression(), !18)
+ call void @llvm.lifetime.start.p0(i64 4, ptr %i) #2, !dbg !25
+ store i32 0, ptr %i, align 4, !dbg !26, !tbaa !20, !DIAssignID !27
+ #dbg_assign(i32 0, !16, !DIExpression(), !27, ptr %i, !DIExpression(), !18)
+ %0 = load i32, ptr %a.addr, align 4, !dbg !28, !tbaa !20
+ %and = and i32 %0, 1, !dbg !30
+ %cmp = icmp eq i32 %and, 1, !dbg !31
+ br i1 %cmp, label %if.then, label %if.else, !dbg !31
+
+if.then: ; preds = %entry
+ %1 = load i32, ptr %a.addr, align 4, !dbg !32, !tbaa !20
+ %sub = sub nsw i32 %1, 1, !dbg !32
+ store i32 %sub, ptr %a.addr, align 4, !dbg !32, !tbaa !20, !DIAssignID !34
+ #dbg_assign(i32 %sub, !15, !DIExpression(), !34, ptr %a.addr, !DIExpression(), !18)
+ %2 = load i32, ptr %a.addr, align 4, !dbg !35, !tbaa !20
+ %3 = load i32, ptr %i, align 4, !dbg !38, !tbaa !20
+ %add = add nsw i32 %3, %2, !dbg !38
+ store i32 %add, ptr %i, align 4, !dbg !38, !tbaa !20, !DIAssignID !39
+ #dbg_assign(i32 %add, !16, !DIExpression(), !39, ptr %i, !DIExpression(), !18)
+ %4 = load i32, ptr %a.addr, align 4, !dbg !40, !tbaa !20
+ %mul = mul nsw i32 10, %4, !dbg !41
+ %5 = load i32, ptr %i, align 4, !dbg !42, !tbaa !20
+ %sub1 = sub nsw i32 %5, %mul, !dbg !42
+ store i32 %sub1, ptr %i, align 4, !dbg !42, !tbaa !20, !DIAssignID !43
+ #dbg_assign(i32 %sub1, !16, !DIExpression(), !43, ptr %i, !DIExpression(), !18)
+ %6 = load i32, ptr %a.addr, align 4, !dbg !44, !tbaa !20
+ %7 = load i32, ptr %a.addr, align 4, !dbg !45, !tbaa !20
+ %mul2 = mul nsw i32 %6, %7, !dbg !46
+ %8 = load i32, ptr %i, align 4, !dbg !47, !tbaa !20
+ %mul3 = mul nsw i32 %8, %mul2, !dbg !47
+ store i32 %mul3, ptr %i, align 4, !dbg !47, !tbaa !20, !DIAssignID !48
+ #dbg_assign(i32 %mul3, !16, !DIExpression(), !48, ptr %i, !DIExpression(), !18)
+ br label %if.end, !dbg !49
+
+if.else: ; preds = %entry
+ %9 = load i32, ptr %a.addr, align 4, !dbg !51, !tbaa !20
+ %add4 = add nsw i32 %9, 3, !dbg !51
+ store i32 %add4, ptr %a.addr, align 4, !dbg !51, !tbaa !20, !DIAssignID !53
+ #dbg_assign(i32 %add4, !15, !DIExpression(), !53, ptr %a.addr, !DIExpression(), !18)
+ %10 = load i32, ptr %a.addr, align 4, !dbg !54, !tbaa !20
+ %11 = load i32, ptr %i, align 4, !dbg !56, !tbaa !20
+ %add5 = add nsw i32 %11, %10, !dbg !56
+ store i32 %add5, ptr %i, align 4, !dbg !56, !tbaa !20, !DIAssignID !57
+ #dbg_assign(i32 %add5, !16, !DIExpression(), !57, ptr %i, !DIExpression(), !18)
+ %12 = load i32, ptr %a.addr, align 4, !dbg !58, !tbaa !20
+ %mul6 = mul nsw i32 10, %12, !dbg !59
+ %13 = load i32, ptr %i, align 4, !dbg !60, !tbaa !20
+ %sub7 = sub nsw i32 %13, %mul6, !dbg !60
+ store i32 %sub7, ptr %i, align 4, !dbg !60, !tbaa !20, !DIAssignID !61
+ #dbg_assign(i32 %sub7, !16, !DIExpression(), !61, ptr %i, !DIExpression(), !18)
+ %14 = load i32, ptr %a.addr, align 4, !dbg !62, !tbaa !20
+ %15 = load i32, ptr %a.addr, align 4, !dbg !63, !tbaa !20
+ %mul8 = mul nsw i32 %14, %15, !dbg !64
+ %16 = load i32, ptr %i, align 4, !dbg !65, !tbaa !20
+ %mul9 = mul nsw i32 %16, %mul8, !dbg !65
+ store i32 %mul9, ptr %i, align 4, !dbg !65, !tbaa !20, !DIAssignID !66
+ #dbg_assign(i32 %mul9, !16, !DIExpression(), !66, ptr %i, !DIExpression(), !18)
+ br label %if.end
+
+if.end: ; preds = %if.else, %if.then
+ %17 = load i32, ptr %i, align 4, !dbg !67, !tbaa !20
+ call void @llvm.lifetime.end.p0(i64 4, ptr %i) #2, !dbg !68
+ ret i32 %17, !dbg !69
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #1
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #1
+
+attributes #0 = { nounwind uwtable "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+fp-armv8,+neon,+v8a,-fmv" }
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
+attributes #2 = { nounwind }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+!llvm.ident = !{!8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 20.0.0git ([email protected]:llvm/llvm-project.git c8ee1164bd6ae2f0a603c53d1d29ad5a3225c5cd)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "repro.c", directory: "", checksumkind: CSK_MD5, checksum: "51454d2babc57d5ea92df6734236bd39")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 7, !"uwtable", ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks for the patch! I'm not too familiar with debug info for textual-inclusions or DILexicalBlockFile, but from what I can tell, a much simpler approach may be possible here. The reason why we merge partially-matching chains of inlined functions is that we create a series of inline stack frames from that chain - so it is possible for example to have 3 stack frames at the same instruction, where the top and bottom frames have a valid line number while the middle frame has line 0. In the case of textual inclusion, we don't produce any inline stack frames - the frame is always just the current subroutine, albeit with a different file. Because of this, partial matching chains of textual inclusion wouldn't give an output that would be visible in a debugger - the only thing that would matter is whether the top "frames" match. I think it makes sense to treat the file more like the outermost part of the line+column matching, i.e. if the files don't match then we use a line 0 in the nearest common scope, and if the files do match then we use that matching file (creating a new DILexicalBlockFile at the nearest common scope if necessary). |
Hi!
I have not dug into the details in this patch yet, but for my understanding, I tried creating some reproducers for @SLTozer's suggestion above:
AFAICT it looks like the patch does not have any effect on these examples (compiled using |
Just few side notes: while the testcase in #122846 was in C++, the original one was not. It was created from a non C++ source. The particular feature there is that inlining is done in (non-LLVM / non-clang) frontend, so we are losing all information similar to Essentially, from the LLVM standpoint the output of this inlining procedure is similar to that of full textual inclusion by a C/C++ preprocessor (testcase in #122846 was created to emulate the original behavior via preprocessor). And it is important not to loose the location information for something inlined this way. But certainly it should be correct, and not something broekn |
Thank you for the explanation. As far as I understood, by using the partial-matching algorithm, we're trying to find the "nearest" subprogram to the incoming locations such that we can produce the nearest subprogram for all "upper" subprograms in chain/stack.
I've tried that approach. The problem is that, if we consider the example below, the nearest common scope for phi instruction for the variable "i" before return statement is DISubprogram, whereas what this patch tries to achieve is that we have "the closest" common file to the incoming location. main.c
y.c
z1.c
z2.c
Here, we can't get the location "y.c:300" assigned to the variable because we have two different DILexicalBlock objects, though produced by the inclusion of the same file. So, it seems that we have to walk top-down over the sequence of included lexical blocks to find the least semantically matching lexical block. |
Thank you! Indeed, for this reproducer, we can't get another common location but src.c:0 for different_inclusion_scope |
Could you explain a little more about how this works? For most consumers of DWARF, including all debuggers I'm aware of, lexical blocks are not directly visible to the user, while inlined calls are (via an inlined stack frame). Is the intent here that a full textual-inclusion callstack be present in the debug info, or just that we don't misattribute any instructions to the wrong file? |
This patch has nothing to do with "full textual-inclusion call stack". The goal here is to preserve as much information as possible when merging locations. I think Anton meant that nested DILexicalBlocks attributed to different files have structure similar to inlinedAt-location chains. |
Alright, that's in line with my initial assumptions about the goal of the patch. So going back to my first comment, is there a need to generalize the location-merging logic? We shouldn't need to produce a full stack of nested DILexicalBlockFiles, because only the immediately containing file of the merged DILocations matters for the debug info consumer. If we get a non-line-0 location, then it will have the correct file as long as we check that the file scope for the merged lines match and use the matching file; if we get a line-0 location, then it doesn't matter what the file is (we can just fall back to the file scope of the containing DISubprogram). In either case, the chain of DILexicalBlockFiles won't be visible and as far as I can tell doesn't need to be produced - is this correct, or am I missing something either in general or about your particular use case? |
If I understand your initial comment correctly, your idea means that we get a line-0 location if L1 and L2 from a |
That is correct - the main point is that if the line is 0, file typically doesn't matter.
Do you have a test case for what you're describing? The test case in this commit right now covers the basic case, i.e. merging two instructions with the same line and the same file. A test case where we have two instructions with different files but a "common include location" might help clarify the intent here (and also should be part of the patch coverage). With that said, as far as I understand what you're describing, a simpler approach is still possible. Even if we use the common include location as the merged location, that is still only a single location - the inlined-chain logic you've adapted here is necessary to produce a list of locations for each inline frame. In this case, the only thing you'd need AFAICT is to modify the |
GetNearestCommonScope finds a common scope for two locations by finding the common parent scope object. With this commit, GetNearestCommonScope considers two scopes equal if they have the same file, line and column values. Thus, it can find the "nearest common included location" when merging locations from different files. This may be useful for a case described here llvm#125780 (comment). DIScope's pointer equality isn't enough for scope equality check, since, for example, two `#include "x.inc"` directives showing up on different lines produce different DILocalScope objects representing the same file content. Thus, two (even same) locations from "x.inc" may have different parent scope objects representing the same source location. If input DILocations are from different files, or common scope is in another file than input locations, a textual inclusion case is assumed, and the location of common scope ("nearest common included location") is returned as a result of getMergedLocation. It fixes llvm#122846.
I figured it out, and I agree that keeping the "right" chain of parent scopes isn't necessary for output DILocations. |
GetNearestCommonScope finds a common scope for two locations by finding the common parent scope object. With this commit, GetNearestCommonScope considers two scopes equal if they have the same file, line and column values. Thus, it can find the "nearest common included location" when merging locations from different files. This may be useful for a case described here llvm#125780 (comment). DIScope's pointer equality isn't enough for scope equality check, since, for example, two `#include "x.inc"` directives showing up on different lines produce different DILocalScope objects representing the same file content. Thus, two (even same) locations from "x.inc" may have different parent scope objects representing the same source location. If input DILocations are from different files, or common scope is in another file than input locations, a textual inclusion case is assumed, and the location of common scope ("nearest common included location") is returned as a result of getMergedLocation. It fixes llvm#122846.
GetNearestCommonScope finds a common scope for two locations by finding the common parent scope object. With this commit, GetNearestCommonScope considers two scopes equal if they have the same file, line and column values. Thus, it can find the "nearest common included location" when merging locations from different files. This may be useful for a case described here llvm#125780 (comment). DIScope's pointer equality isn't enough for scope equality check, since, for example, two `#include "x.inc"` directives showing up on different lines produce different DILocalScope objects representing the same file content. Thus, two (even same) locations from "x.inc" may have different parent scope objects representing the same source location. If input DILocations are from different files, or common scope is in another file than input locations, a textual inclusion case is assumed, and the location of common scope ("nearest common included location") is returned as a result of getMergedLocation. It fixes llvm#122846.
GetNearestCommonScope finds a common scope for two locations by finding the common parent scope object. With this commit, GetNearestCommonScope considers two scopes equal if they have the same file, line and column values. Thus, it can find the "nearest common included location" when merging locations from different files. This may be useful for a case described here llvm#125780 (comment). DIScope's pointer equality isn't enough for scope equality check, since, for example, two `#include "x.inc"` directives showing up on different lines produce different DILocalScope objects representing the same file content. Thus, two (even same) locations from "x.inc" may have different parent scope objects representing the same source location. If input DILocations are from different files, or common scope is in another file than input locations, a textual inclusion case is assumed, and the location of common scope ("nearest common included location") is returned as a result of getMergedLocation. It fixes llvm#122846.
…132286) getMergedLocation uses a common parent scope of the two input locations for an output location. It doesn't consider the case when the common parent scope is from a file other than L1's and L2's files. In that case, it produces a merged location with an erroneous scope (#122846). In some cases, such as #125780 (comment), L1, L2 having a common parent scope from another file indicate that the code at L1 and L2 is included from the same source location. With this commit, getMergedLocation detects that L1, L2, or their common parent scope files are different. If so, it assumes that L1 and L2 were included from some source location, and tries to attach the output location to a scope with the nearest common source location with regard to L1 and L2. If the nearest common location is also from another file, getMergedLocation returns it as a merged location, assuming that L1 and L2 belong to files that were both included in the nearest common location. Fixes #122846.
… location (#132286) getMergedLocation uses a common parent scope of the two input locations for an output location. It doesn't consider the case when the common parent scope is from a file other than L1's and L2's files. In that case, it produces a merged location with an erroneous scope (llvm/llvm-project#122846). In some cases, such as llvm/llvm-project#125780 (comment), L1, L2 having a common parent scope from another file indicate that the code at L1 and L2 is included from the same source location. With this commit, getMergedLocation detects that L1, L2, or their common parent scope files are different. If so, it assumes that L1 and L2 were included from some source location, and tries to attach the output location to a scope with the nearest common source location with regard to L1 and L2. If the nearest common location is also from another file, getMergedLocation returns it as a merged location, assuming that L1 and L2 belong to files that were both included in the nearest common location. Fixes llvm/llvm-project#122846.
…lvm#132286) getMergedLocation uses a common parent scope of the two input locations for an output location. It doesn't consider the case when the common parent scope is from a file other than L1's and L2's files. In that case, it produces a merged location with an erroneous scope (llvm#122846). In some cases, such as llvm#125780 (comment), L1, L2 having a common parent scope from another file indicate that the code at L1 and L2 is included from the same source location. With this commit, getMergedLocation detects that L1, L2, or their common parent scope files are different. If so, it assumes that L1 and L2 were included from some source location, and tries to attach the output location to a scope with the nearest common source location with regard to L1 and L2. If the nearest common location is also from another file, getMergedLocation returns it as a merged location, assuming that L1 and L2 belong to files that were both included in the nearest common location. Fixes llvm#122846.
…lvm#132286) getMergedLocation uses a common parent scope of the two input locations for an output location. It doesn't consider the case when the common parent scope is from a file other than L1's and L2's files. In that case, it produces a merged location with an erroneous scope (llvm#122846). In some cases, such as llvm#125780 (comment), L1, L2 having a common parent scope from another file indicate that the code at L1 and L2 is included from the same source location. With this commit, getMergedLocation detects that L1, L2, or their common parent scope files are different. If so, it assumes that L1 and L2 were included from some source location, and tries to attach the output location to a scope with the nearest common source location with regard to L1 and L2. If the nearest common location is also from another file, getMergedLocation returns it as a merged location, assuming that L1 and L2 belong to files that were both included in the nearest common location. Fixes llvm#122846.
Fixes #122846.
The cause of this issue was that GetNearestCommonScope within getMergedLocation, which used to find a common scope for a matching locations pair, ignored DILexicalBlockFile while walking up the parent chain of incoming DILocations. Therefore, getMergedLocation could attach DILocation with line number from one file to the scope from another file. Whereas in the reproducer from the issue, both incoming locations have the same line and column numbers, and they can be just attached to one of the DILexicalBlockFile's (which represent the same file and have the same scope but are not distinct).
To fix that issue and to be able to merge locations from different included files,
inlinedAt chains processing code from 12a7aea was generalized so that the same algorithm can be used for processing inlinedAt chains and nested DILexicalBlocks with changing file attribute chains, as they both can be considered as textual inclusions (as proposed in
DILocation::getMergedLocation
produces invalid result while merging locations fromDILexicalBlockFile
#122846).getMergedLocations tries to merge chains of inlined locations. Here, for each matching pair of locations from inlinedAt chains, chains of their parent scopes are traversed. Only scopes within the same DISubprogram are considered (as all pairs of locations from inlinedAt chains can only be merged if they belong to the same subprogram).
Also, only those parent scopes are considered, that have a different file attribute value from the children scope, as they represent textual inclusions.
Such parent scope chains from incoming locations LocA and LocB are processed in the way similar to 12a7aea, and matching pairs from textual inclusion chains are being merged. The valid merge location for the most local to LocA and LocB matching pair from parent scope chains is taken as getMergedLocation result.