Skip to content

[DebugInfo] (Always) include the dwo name in the hash #100375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 5, 2024
Merged

Conversation

labath
Copy link
Collaborator

@labath labath commented Jul 24, 2024

Since ce0c205, we are doing that if a single (LTO) compilation contains more than one compile unit, but the same thing can happen if the non-lto and single-cu lto compilations, typically when the CU ends up (nearly) empty. In my case, this happened when LTO emptied two compile units.

Note that the source file name is already a part of the hash, so this can only happen when a single file is compiled and linked twice into the same application (most likely with different preprocessor defintiions). While not exactly common, this pattern is used by some C code to implement "templates".

The 2017 patch already hinted at the possibility of doing this unconditionally, and this patch implements that. While the DWARF spec hints at the option of using the type signature hashing algorithm for the DWO_id purposes, AFAICT it does not actually require it, so I believe this change is still conforming.

The relevant section of the spec is in Section 3.1.2 "Skeleton Compilation Unit Entries" (in non-normative text):

The means of determining a compilation unit ID does not need to be
similar or related to the means of determining a type unit signature.
However, it should be suitable for detecting file version skew or other
kinds of mismatched files and for looking up a full split unit in a
DWARF package file (see Section 7.3.5 on page 190).

Since ce0c205, we are doing that if a
single (LTO) compilation contains more than one compile unit, but the
same thing can happen if the non-lto and single-cu lto compilations,
typically when the CU ends up (nearly) empty. In my case, this happened
when LTO emptied two compile units.

Note that the source file name is already a part of the hash, so this
can only happen when a single file is compiled and linked twice into the
same application (most likely with different preprocessor defintiions).
While not exactly common, this pattern is used by some C code to
implement "templated" code.

The 2017 patch already hinted at the possibility of doing this
unconditionally, and this patch implements that. While the DWARF spec
hints at the option of using the type signature hashing algorithm for
the DWO_id purposes, AFAICT it does not actually require it, so I
believe this change is still conforming.

The relevant section of the spec is in Section 3.1.2 "Skeleton
Compilation Unit Entries" (in non-normative text):

```
The means of determining a compilation unit ID does not need to be
similar or related to the means of determining a type unit signature.
However, it should be suitable for detecting file version skew or other
kinds of mismatched files and for looking up a full split unit in a
DWARF package file (see Section 7.3.5 on page 190).
```
@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-debuginfo

Author: Pavel Labath (labath)

Changes

Since ce0c205, we are doing that if a single (LTO) compilation contains more than one compile unit, but the same thing can happen if the non-lto and single-cu lto compilations, typically when the CU ends up (nearly) empty. In my case, this happened when LTO emptied two compile units.

Note that the source file name is already a part of the hash, so this can only happen when a single file is compiled and linked twice into the same application (most likely with different preprocessor defintiions). While not exactly common, this pattern is used by some C code to implement "templates".

The 2017 patch already hinted at the possibility of doing this unconditionally, and this patch implements that. While the DWARF spec hints at the option of using the type signature hashing algorithm for the DWO_id purposes, AFAICT it does not actually require it, so I believe this change is still conforming.

The relevant section of the spec is in Section 3.1.2 "Skeleton Compilation Unit Entries" (in non-normative text):

The means of determining a compilation unit ID does not need to be
similar or related to the means of determining a type unit signature.
However, it should be suitable for detecting file version skew or other
kinds of mismatched files and for looking up a full split unit in a
DWARF package file (see Section 7.3.5 on page 190).

Full diff: https://github.com/llvm/llvm-project/pull/100375.diff

15 Files Affected:

  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+7-12)
  • (modified) llvm/test/CodeGen/X86/dwarf-headers.ll (+8-9)
  • (modified) llvm/test/DebugInfo/COFF/fission-cu.ll (+2-2)
  • (modified) llvm/test/DebugInfo/COFF/fission-sections.ll (+1-1)
  • (modified) llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll (+8-9)
  • (modified) llvm/test/DebugInfo/WebAssembly/fission-cu.ll (+2-2)
  • (modified) llvm/test/DebugInfo/X86/convert-debugloc.ll (+3-1)
  • (modified) llvm/test/DebugInfo/X86/convert-loclist.ll (+1-1)
  • (modified) llvm/test/DebugInfo/X86/fission-cu.ll (+2-2)
  • (modified) llvm/test/DebugInfo/X86/fission-hash-local.ll (+2-2)
  • (modified) llvm/test/DebugInfo/X86/fission-hash.ll (+2-2)
  • (added) llvm/test/DebugInfo/X86/split-dwarf-dwo-hash.ll (+33)
  • (removed) llvm/test/DebugInfo/X86/split-dwarf-multiple-cu-hash.ll (-45)
  • (modified) llvm/test/DebugInfo/X86/sret.ll (+2-2)
  • (modified) llvm/test/MC/WebAssembly/dwarfdump.ll (+1-1)
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 5f1f315c5ab24..dcc4bf20a74f2 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -1259,13 +1259,6 @@ void DwarfDebug::finalizeModuleInfo() {
 
   finishEntityDefinitions();
 
-  // Include the DWO file name in the hash if there's more than one CU.
-  // This handles ThinLTO's situation where imported CUs may very easily be
-  // duplicate with the same CU partially imported into another ThinLTO unit.
-  StringRef DWOName;
-  if (CUMap.size() > 1)
-    DWOName = Asm->TM.Options.MCOptions.SplitDwarfFile;
-
   bool HasEmittedSplitCU = false;
 
   // Handle anything that needs to be done on a per-unit basis after
@@ -1294,11 +1287,13 @@ void DwarfDebug::finalizeModuleInfo() {
                                          ? dwarf::DW_AT_dwo_name
                                          : dwarf::DW_AT_GNU_dwo_name;
       finishUnitAttributes(TheCU.getCUNode(), TheCU);
-      TheCU.addString(TheCU.getUnitDie(), attrDWOName,
-                      Asm->TM.Options.MCOptions.SplitDwarfFile);
-      SkCU->addString(SkCU->getUnitDie(), attrDWOName,
-                      Asm->TM.Options.MCOptions.SplitDwarfFile);
-      // Emit a unique identifier for this CU.
+      StringRef DWOName = Asm->TM.Options.MCOptions.SplitDwarfFile;
+      TheCU.addString(TheCU.getUnitDie(), attrDWOName, DWOName);
+      SkCU->addString(SkCU->getUnitDie(), attrDWOName, DWOName);
+      // Emit a unique identifier for this CU. Include the DWO file name in the
+      // hash to avoid the case where two (almost) empty compile units have the
+      // same file. This can happen if link-time optimization removes nearly all
+      // (unused) code from a CU.
       uint64_t ID =
           DIEHash(Asm, &TheCU).computeCUSignature(DWOName, TheCU.getUnitDie());
       if (getDwarfVersion() >= 5) {
diff --git a/llvm/test/CodeGen/X86/dwarf-headers.ll b/llvm/test/CodeGen/X86/dwarf-headers.ll
index 3a3d9998659b2..2d7619512e981 100644
--- a/llvm/test/CodeGen/X86/dwarf-headers.ll
+++ b/llvm/test/CodeGen/X86/dwarf-headers.ll
@@ -12,11 +12,10 @@
 ; RUN:     -filetype=obj -O0 -mtriple=x86_64-unknown-linux-gnu < %s \
 ; RUN:     | llvm-dwarfdump -v - | FileCheck %s --check-prefix=SINGLE-5
 
-; RUN: llc -split-dwarf-file=foo.dwo -split-dwarf-output=%t.dwo \
+; RUN: llc -split-dwarf-file=foo.dwo \
 ; RUN:     -dwarf-version=5 -generate-type-units \
 ; RUN:     -filetype=obj -O0 -mtriple=x86_64-unknown-linux-gnu < %s \
 ; RUN:     | llvm-dwarfdump -v - | FileCheck %s --check-prefix=O-5
-; RUN: llvm-dwarfdump -v %t.dwo | FileCheck %s --check-prefix=DWO-5
 
 ; Looking for DWARF headers to be generated correctly.
 ; There are 8 variants with 5 formats: v4 CU, v4 TU, v5 normal/partial CU,
@@ -74,15 +73,15 @@
 ;
 ; O-5: .debug_info contents:
 ; O-5: 0x00000000: Compile Unit: {{.*}} version = 0x0005, unit_type = DW_UT_skeleton, abbr_offset
-; O-5-SAME:        DWO_id = 0xccd7e58ef8bf4aa6
+; O-5-SAME:        DWO_id = [[HASH:0x[0-9a-f]*]]
 ; O-5: 0x00000014: DW_TAG_skeleton_unit
 ;
-; DWO-5: .debug_info.dwo contents:
-; DWO-5: 0x00000000: Type Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_type, abbr_offset
-; DWO-5: 0x00000018: DW_TAG_type_unit
-; DWO-5: 0x00000035: Compile Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset
-; DWO-5-SAME:        DWO_id = 0xccd7e58ef8bf4aa6
-; DWO-5: 0x00000049: DW_TAG_compile_unit
+; O-5: .debug_info.dwo contents:
+; O-5: 0x00000000: Type Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_type, abbr_offset
+; O-5: 0x00000018: DW_TAG_type_unit
+; O-5: 0x00000035: Compile Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset
+; O-5-SAME:        DWO_id = [[HASH]]
+; O-5: 0x00000049: DW_TAG_compile_unit
 
 
 ; ModuleID = 't.cpp'
diff --git a/llvm/test/DebugInfo/COFF/fission-cu.ll b/llvm/test/DebugInfo/COFF/fission-cu.ll
index 944d886c3e834..3afcb8717e31f 100644
--- a/llvm/test/DebugInfo/COFF/fission-cu.ll
+++ b/llvm/test/DebugInfo/COFF/fission-cu.ll
@@ -62,7 +62,7 @@ source_filename = "test/DebugInfo/X86/fission-cu.ll"
 ; CHECK-NEXT: DW_AT_stmt_list [DW_FORM_sec_offset]   (0x00000000)
 ; CHECK-NEXT: DW_AT_comp_dir [DW_FORM_strp]     ( .debug_str[0x00000000] = "e:\\llvm-project\\tmp")
 ; CHECK-NEXT: DW_AT_GNU_dwo_name [DW_FORM_strp] ( .debug_str[0x00000014] = "baz.dwo")
-; CHECK-NEXT: DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x1f1f859683d49324)
+; CHECK-NEXT: DW_AT_GNU_dwo_id [DW_FORM_data8]  ([[HASH:0x[0-9a-f]*]])
 
 ; Check that the rest of the compile units have information.
 ; CHECK: .debug_info.dwo contents:
@@ -74,7 +74,7 @@ source_filename = "test/DebugInfo/X86/fission-cu.ll"
 ; CHECK-NOT: DW_AT_low_pc
 ; CHECK-NOT: DW_AT_stmt_list
 ; CHECK-NOT: DW_AT_comp_dir
-; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x1f1f859683d49324)
+; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8]  ([[HASH]])
 ; CHECK: DW_TAG_variable
 ; CHECK: DW_AT_name [DW_FORM_GNU_str_index]     (indexed (00000000) string = "a")
 ; CHECK: DW_AT_type [DW_FORM_ref4]       (cu + 0x{{[0-9a-f]*}} => {[[TYPE:0x[0-9a-f]*]]}
diff --git a/llvm/test/DebugInfo/COFF/fission-sections.ll b/llvm/test/DebugInfo/COFF/fission-sections.ll
index e7494ce92dfcf..754e2b888c202 100644
--- a/llvm/test/DebugInfo/COFF/fission-sections.ll
+++ b/llvm/test/DebugInfo/COFF/fission-sections.ll
@@ -51,7 +51,7 @@ source_filename = "test/DebugInfo/X86/fission-cu.ll"
 ; DWO-NEXT: [ 2](sec  2)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .debug_str_offsets.dwo
 ; DWO-NEXT: AUX scnlen 0x14 nreloc 0 nlnno 0 checksum 0x9392f0f0 assoc 2 comdat 0
 ; DWO-NEXT: [ 4](sec  3)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .debug_info.dwo
-; DWO-NEXT: AUX scnlen 0x29 nreloc 0 nlnno 0 checksum 0xc8e5b275 assoc 3 comdat 0
+; DWO-NEXT: AUX scnlen 0x29 nreloc 0 nlnno 0 checksum 0x{{[0-9a-f]*}} assoc 3 comdat 0
 ; DWO-NEXT: [ 6](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .debug_abbrev.dwo
 ; DWO-NEXT: AUX scnlen 0x33 nreloc 0 nlnno 0 checksum 0x8056e5e4 assoc 4 comdat 0
 ; DWO-EMPTY:
diff --git a/llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll b/llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll
index efa3d1c4a46e1..94855b0525ad0 100644
--- a/llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll
+++ b/llvm/test/DebugInfo/WebAssembly/dwarf-headers.ll
@@ -12,11 +12,10 @@
 ; RUN:     -filetype=obj -O0 -mtriple= < %s \
 ; RUN:     | llvm-dwarfdump -v - | FileCheck %s --check-prefix=SINGLE-5
 
-; RUN: llc -split-dwarf-file=foo.dwo -split-dwarf-output=%t.dwo \
+; RUN: llc -split-dwarf-file=foo.dwo \
 ; RUN:     -dwarf-version=5 -generate-type-units \
 ; RUN:     -filetype=obj -O0 -mtriple= < %s \
 ; RUN:     | llvm-dwarfdump -v - | FileCheck %s --check-prefix=O-5
-; RUN: llvm-dwarfdump -v %t.dwo | FileCheck %s --check-prefix=DWO-5
 
 ; This test is derived from test/CodeGen/X86/dwarf-headers.ll
 
@@ -76,15 +75,15 @@
 ;
 ; O-5: .debug_info contents:
 ; O-5: 0x00000000: Compile Unit: {{.*}} version = 0x0005, unit_type = DW_UT_skeleton, abbr_offset
-; O-5-SAME:        DWO_id = 0xccd7e58ef8bf4aa6
+; O-5-SAME:        DWO_id = [[HASH:0x[0-9a-f]*]]
 ; O-5: 0x00000014: DW_TAG_skeleton_unit
 ;
-; DWO-5: .debug_info.dwo contents:
-; DWO-5: 0x00000000: Type Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_type, abbr_offset
-; DWO-5: 0x00000018: DW_TAG_type_unit
-; DWO-5: 0x00000035: Compile Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset
-; DWO-5-SAME:        DWO_id = 0xccd7e58ef8bf4aa6
-; DWO-5: 0x00000049: DW_TAG_compile_unit
+; O-5: .debug_info.dwo contents:
+; O-5: 0x00000000: Type Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_type, abbr_offset
+; O-5: 0x00000018: DW_TAG_type_unit
+; O-5: 0x00000035: Compile Unit: {{.*}} version = 0x0005, unit_type = DW_UT_split_compile, abbr_offset
+; O-5-SAME:        DWO_id = [[HASH]]
+; O-5: 0x00000049: DW_TAG_compile_unit
 
 
 ; ModuleID = 't.cpp'
diff --git a/llvm/test/DebugInfo/WebAssembly/fission-cu.ll b/llvm/test/DebugInfo/WebAssembly/fission-cu.ll
index 8a04d48d4de73..6eb9ba6fea636 100644
--- a/llvm/test/DebugInfo/WebAssembly/fission-cu.ll
+++ b/llvm/test/DebugInfo/WebAssembly/fission-cu.ll
@@ -62,7 +62,7 @@ source_filename = "test/DebugInfo/WebAssembly/fission-cu.ll"
 ; CHECK-NEXT: DW_AT_stmt_list [DW_FORM_sec_offset]   (0x00000000)
 ; CHECK-NEXT: DW_AT_comp_dir [DW_FORM_strp]     ( .debug_str[0x00000000] = "/usr/local/google/home/echristo/tmp")
 ; CHECK-NEXT: DW_AT_GNU_dwo_name [DW_FORM_strp] ( .debug_str[0x00000024] = "baz.dwo")
-; CHECK-NEXT: DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x1f1f859683d49324)
+; CHECK-NEXT: DW_AT_GNU_dwo_id [DW_FORM_data8]  ([[HASH:0x[0-9a-f]*]])
 
 ; Check that the rest of the compile units have information.
 ; CHECK: .debug_info.dwo contents:
@@ -74,7 +74,7 @@ source_filename = "test/DebugInfo/WebAssembly/fission-cu.ll"
 ; CHECK-NOT: DW_AT_low_pc
 ; CHECK-NOT: DW_AT_stmt_list
 ; CHECK-NOT: DW_AT_comp_dir
-; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x1f1f859683d49324)
+; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8]  ([[HASH]])
 ; CHECK: DW_TAG_variable
 ; CHECK: DW_AT_name [DW_FORM_GNU_str_index]     (indexed (00000000) string = "a")
 ; CHECK: DW_AT_type [DW_FORM_ref4]       (cu + 0x{{[0-9a-f]*}} => {[[TYPE:0x[0-9a-f]*]]}
diff --git a/llvm/test/DebugInfo/X86/convert-debugloc.ll b/llvm/test/DebugInfo/X86/convert-debugloc.ll
index ad3f48c05de99..73072eda01043 100644
--- a/llvm/test/DebugInfo/X86/convert-debugloc.ll
+++ b/llvm/test/DebugInfo/X86/convert-debugloc.ll
@@ -27,7 +27,9 @@
 ; RUN:   | FileCheck %s --check-prefix=VERBOSE --check-prefix=CONV "--implicit-check-not={{DW_TAG|NULL}}"
 
 
-; SPLITCONV: Compile Unit:{{.*}} DWO_id = 0x06580df9fdd5b54b
+;; NB: This checks that the type reference in DW_OP_convert participates in the
+;; DWO hash.
+; SPLITCONV: Compile Unit:{{.*}} DWO_id = 0xbf4ac610d9ea282c
 ; SPLIT: DW_TAG_skeleton_unit
 
 ; CONV: DW_TAG_compile_unit
diff --git a/llvm/test/DebugInfo/X86/convert-loclist.ll b/llvm/test/DebugInfo/X86/convert-loclist.ll
index eb3c4128e9166..720bc46896ced 100644
--- a/llvm/test/DebugInfo/X86/convert-loclist.ll
+++ b/llvm/test/DebugInfo/X86/convert-loclist.ll
@@ -13,7 +13,7 @@
 ; often - add another IR file with a different DW_OP_convert that's otherwise
 ; identical and demonstrate that they have different DWO IDs.
 
-; SPLIT: 0x00000000: Compile Unit: {{.*}} DWO_id = 0x4dbee91db55385db
+; SPLIT: 0x00000000: Compile Unit: {{.*}} DWO_id = 0x6f8c6730fe6cbff0
 
 ; Regression testing a fairly quirky bug where instead of hashing (see above),
 ; extra bytes would be emitted into the output assembly in no
diff --git a/llvm/test/DebugInfo/X86/fission-cu.ll b/llvm/test/DebugInfo/X86/fission-cu.ll
index d5cca77143525..d20567290b15f 100644
--- a/llvm/test/DebugInfo/X86/fission-cu.ll
+++ b/llvm/test/DebugInfo/X86/fission-cu.ll
@@ -60,7 +60,7 @@ source_filename = "test/DebugInfo/X86/fission-cu.ll"
 ; CHECK-NEXT: DW_AT_stmt_list [DW_FORM_sec_offset]   (0x00000000)
 ; CHECK-NEXT: DW_AT_comp_dir [DW_FORM_strp]     ( .debug_str[0x00000000] = "/usr/local/google/home/echristo/tmp")
 ; CHECK-NEXT: DW_AT_GNU_dwo_name [DW_FORM_strp] ( .debug_str[0x00000024] = "baz.dwo")
-; CHECK-NEXT: DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x1f1f859683d49324)
+; CHECK-NEXT: DW_AT_GNU_dwo_id [DW_FORM_data8]  ([[HASH:0x[0-9a-f]*]])
 
 ; Check that the rest of the compile units have information.
 ; CHECK: .debug_info.dwo contents:
@@ -72,7 +72,7 @@ source_filename = "test/DebugInfo/X86/fission-cu.ll"
 ; CHECK-NOT: DW_AT_low_pc
 ; CHECK-NOT: DW_AT_stmt_list
 ; CHECK-NOT: DW_AT_comp_dir
-; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8]  (0x1f1f859683d49324)
+; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8]  ([[HASH]])
 ; CHECK: DW_TAG_variable
 ; CHECK: DW_AT_name [DW_FORM_GNU_str_index]     (indexed (00000000) string = "a")
 ; CHECK: DW_AT_type [DW_FORM_ref4]       (cu + 0x{{[0-9a-f]*}} => {[[TYPE:0x[0-9a-f]*]]}
diff --git a/llvm/test/DebugInfo/X86/fission-hash-local.ll b/llvm/test/DebugInfo/X86/fission-hash-local.ll
index 3abca0176227c..f7970461f815b 100644
--- a/llvm/test/DebugInfo/X86/fission-hash-local.ll
+++ b/llvm/test/DebugInfo/X86/fission-hash-local.ll
@@ -13,8 +13,8 @@
 ;   int i = 7; // or 9
 ; }
 
-; H1: DW_AT_GNU_dwo_id  (0xc1220cf66b1190ad)
-; H2: DW_AT_GNU_dwo_id  (0xf66067a0cf366f0e)
+; H1: DW_AT_GNU_dwo_id  (0xb06b36aa6004befe)
+; H2: DW_AT_GNU_dwo_id  (0xc3e980cf3dfa79a2)
 
 ; Function Attrs: norecurse nounwind readnone uwtable
 define dso_local void @_Z2f1v() local_unnamed_addr !dbg !7 {
diff --git a/llvm/test/DebugInfo/X86/fission-hash.ll b/llvm/test/DebugInfo/X86/fission-hash.ll
index f29323d6fa5af..73e22237e7aab 100644
--- a/llvm/test/DebugInfo/X86/fission-hash.ll
+++ b/llvm/test/DebugInfo/X86/fission-hash.ll
@@ -3,8 +3,8 @@
 
 ; The source is an empty file, modified to include/retain an 'int' type, since empty CUs are omitted.
 
-; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8] (0x50d985146a74bb00)
-; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8] (0x50d985146a74bb00)
+; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8] (0xb1b50e9a23896bc1)
+; CHECK: DW_AT_GNU_dwo_id [DW_FORM_data8] (0xb1b50e9a23896bc1)
 
 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!3, !4}
diff --git a/llvm/test/DebugInfo/X86/split-dwarf-dwo-hash.ll b/llvm/test/DebugInfo/X86/split-dwarf-dwo-hash.ll
new file mode 100644
index 0000000000000..583ab3e787bad
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/split-dwarf-dwo-hash.ll
@@ -0,0 +1,33 @@
+; RUN: rm -rf %t && mkdir -p %t
+; RUN: llc -split-dwarf-file=foo.dwo  %s -filetype=obj -o %t/a.o
+; RUN: llc -split-dwarf-file=bar.dwo  %s -filetype=obj -o %t/b.o
+; RUN: llvm-dwarfdump -debug-info %t/a.o %t/b.o | FileCheck %s
+
+; CHECK: {{.*}}a.o: file format elf64-x86-64
+; CHECK: 0x00000000: Compile Unit: {{.*}}, DWO_id = [[HASH:0x[0-9a-f]*]]
+; CHECK: {{.*}}b.o: file format elf64-x86-64
+; CHECK-NOT: DWO_id = [[HASH]]
+
+target triple = "x86_64-pc-linux"
+
+; Function Attrs: noinline nounwind uwtable
+define void @_Z1av() !dbg !9 {
+entry:
+  ret void, !dbg !12
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.ident = !{!5}
+!llvm.module.flags = !{!6, !7, !8}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 5.0.0 (trunk 304107) (llvm/trunk 304109)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
+!1 = !DIFile(filename: "a.cpp", directory: "/tmp")
+!2 = !{}
+!5 = !{!"clang version 5.0.0 (trunk 304107) (llvm/trunk 304109)"}
+!6 = !{i32 2, !"Dwarf Version", i32 5}
+!7 = !{i32 2, !"Debug Info Version", i32 3}
+!8 = !{i32 1, !"wchar_size", i32 4}
+!9 = distinct !DISubprogram(name: "a", linkageName: "_Z1av", scope: !1, file: !1, line: 1, type: !10, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0, retainedNodes: !2)
+!10 = !DISubroutineType(types: !11)
+!11 = !{null}
+!12 = !DILocation(line: 2, column: 1, scope: !9)
diff --git a/llvm/test/DebugInfo/X86/split-dwarf-multiple-cu-hash.ll b/llvm/test/DebugInfo/X86/split-dwarf-multiple-cu-hash.ll
deleted file mode 100644
index 89c2c8c07e4b9..0000000000000
--- a/llvm/test/DebugInfo/X86/split-dwarf-multiple-cu-hash.ll
+++ /dev/null
@@ -1,45 +0,0 @@
-; RUN: rm -rf %t && mkdir -p %t
-; RUN: llc -split-dwarf-file=foo.dwo  %s -filetype=obj -o %t/a.o
-; RUN: llc -split-dwarf-file=bar.dwo  %s -filetype=obj -o %t/b.o
-; RUN: llvm-dwarfdump -debug-info %t/a.o %t/b.o | FileCheck %s
-
-; CHECK: .debug_info contents:
-; CHECK: dwo_id {{.*}}([[HASH:.*]])
-; CHECK-NOT: dwo_id {{.*}}([[HASH]])
-; CHECK: .debug_info.dwo contents:
-
-target triple = "x86_64-pc-linux"
-
-; Function Attrs: noinline nounwind uwtable
-define void @_Z1av() #0 !dbg !9 {
-entry:
-  ret void, !dbg !12
-}
-
-; Function Attrs: noinline nounwind uwtable
-define void @_Z1bv() #0 !dbg !13 {
-entry:
-  ret void, !dbg !14
-}
-
-attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
-
-!llvm.dbg.cu = !{!0, !3}
-!llvm.ident = !{!5, !5}
-!llvm.module.flags = !{!6, !7, !8}
-
-!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 5.0.0 (trunk 304107) (llvm/trunk 304109)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
-!1 = !DIFile(filename: "a.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
-!2 = !{}
-!3 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !4, producer: "clang version 5.0.0 (trunk 304107) (llvm/trunk 304109)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
-!4 = !DIFile(filename: "b.cpp", directory: "/usr/local/google/home/blaikie/dev/scratch")
-!5 = !{!"clang version 5.0.0 (trunk 304107) (llvm/trunk 304109)"}
-!6 = !{i32 2, !"Dwarf Version", i32 4}
-!7 = !{i32 2, !"Debug Info Version", i32 3}
-!8 = !{i32 1, !"wchar_size", i32 4}
-!9 = distinct !DISubprogram(name: "a", linkageName: "_Z1av", scope: !1, file: !1, line: 1, type: !10, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0, retainedNodes: !2)
-!10 = !DISubroutineType(types: !11)
-!11 = !{null}
-!12 = !DILocation(line: 2, column: 1, scope: !9)
-!13 = distinct !DISubprogram(name: "b", linkageName: "_Z1bv", scope: !4, file: !4, line: 1, type: !10, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !3, retainedNodes: !2)
-!14 = !DILocation(line: 2, column: 1, scope: !13)
diff --git a/llvm/test/DebugInfo/X86/sret.ll b/llvm/test/DebugInfo/X86/sret.ll
index 222fb05b6df8b..be7c36111d39a 100644
--- a/llvm/test/DebugInfo/X86/sret.ll
+++ b/llvm/test/DebugInfo/X86/sret.ll
@@ -5,8 +5,8 @@
 
 ; Based on the debuginfo-tests/sret.cpp code.
 
-; CHECK-DWO: DW_AT_GNU_dwo_id (0x7db1cc8453a47c44)
-; CHECK-DWO: DW_AT_GNU_dwo_id (0x7db1cc8453a47c44)
+; CHECK-DWO: DW_AT_GNU_dwo_id (0x044dcdf3d75b11a1)
+; CHECK-DWO: DW_AT_GNU_dwo_id (0x044dcdf3d75b11a1)
 
 ; RUN: llc -O0 -fast-isel=true -mtriple=x86_64-apple-darwin -filetype=obj -o - %s | llvm-dwarfdump -debug-info - | FileCheck -check-prefixes=CHECK,FASTISEL %s
 ; RUN: llc -O0 -fast-isel=false -mtriple=x86_64-apple-darwin -filetype=obj -o - %s | llvm-dwarfdump -debug-info - | FileCheck -check-prefixes=CHECK,SDAG %s
diff --git a/llvm/test/MC/WebAssembly/dwarfdump.ll b/llvm/test/MC/WebAssembly/dwarfdump.ll
index b6a625b15fcd2..0f3bc4d86fae8 100644
--- a/llvm/test/MC/WebAssembly/dwarfdump.ll
+++ b/llvm/test/MC/WebAssembly/dwarfdump.ll
@@ -66,7 +66,7 @@
 ; SPLIT-NEXT:               DW_AT_language    (DW_LANG_C99)
 ; SPLIT-NEXT:               DW_AT_name        ("test.c")
 ; SPLIT-NEXT:               DW_AT_GNU_dwo_name        ("{{.*}}dwarfdump.ll.tmp.dwo")
-; SPLIT-NEXT:               DW_AT_GNU_dwo_id  (0xad3151f12153fa17)
+; SPLIT-NEXT:               DW_AT_GNU_dwo_id  ({{.*}})
 
 ; SPLIT:      0x00000019:   DW_TAG_variable
 ; SPLIT-NEXT:                 DW_AT_name      ("foo")

@@ -0,0 +1,33 @@
; RUN: rm -rf %t && mkdir -p %t
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the llvm/test/DebugInfo/X86/split-dwarf-multiple-cu-hash.ll file (which git thinks I've deleted). I'm not sure what happened, but that test was not functional right now (CHECK-NOT was matching not because the DWO_ids were different, but because the output contained only one CU)

@dwblaikie
Copy link
Collaborator

Mixed feelings, but just feelings - nothing concretely wrong with the idea that I can think of.

The original change I hadn't remembered making, so thanks for doing the research/linking to the original commit - I don't think that probably triggers for ThinLTO anymore, as I guess after that change we hit other scaling problems (with just the overhead of having imported content in separate unit fragments - so many small units really added up) and also fundamental limitations regarding cross-unit referencing in Split DWARF (totally unsupported/no way to describe cross-unit references in dwo/dwp files (& not clear whether a dwo could have multiple compile units in it in any case - though since it can have some CUs and multiple TUs, multiple CUs probably falls out OK-ish... )) - and so we ended up disabling cross-cu references when using Split DWARF, and that meant the tiny imported units no longer existed and the case where there might be multiple CUs in ThinLTO no longer occurs.
(oh, this ^ would explain your comment "I'm not sure what happened, but that test was not functional right now (CHECK-NOT was matching not because the DWO_ids were different, but because the output contained only one CU)" - the original test could probably be restored, but the value would be questionable, by enabling split-dwarf-cross-cu-references)

But, yes, it seems quite possible to end up with a couple of units that optimize away to basically nothing...

Though at that point, I guess there's a fair question about why the ThinLink didn't discard these units entirely - do you have an isolated reproduction of that? Might be worth looking into... - might come back to "we should home/import debug info type definitions the same way we home/import inline functions" and in the absence of that /maybe/ just having some debug info is keeping these objects alive in the link graph? But I'd be surprised if that was the case...

@labath
Copy link
Collaborator Author

labath commented Jul 25, 2024

My reproducer is basically this:

==> a.cc <==
#include "enum.h"

__attribute__((visibility("hidden"))) void FFF(E e) {}

==> b.sh <==
#!/bin/bash

set -e -x

clang++ -flto=thin -O2 -c -o a.obc ../a.cc -g -gsplit-dwarf -DFFF=fa
clang++ -flto=thin -O2 -c -o b.obc ../a.cc -g -gsplit-dwarf -DFFF=fb
clang++ -flto=thin -O2 -c -o main.obc ../main.cc -g -gsplit-dwarf

clang++ -fuse-ld=lld -flto=thin -O2 -g -gsplit-dwarf {a,b,main}.obc -Wl,--thinlto-index-only,--thinlto-emit-imports-files

for i in a b main; do
  clang++ -flto=thin -O2 -c -fthinlto-index=$i.obc.thinlto.bc -o $i.l.bc $i.obc -g -gsplit-dwarf
  clang++ $i.l.bc -g -gsplit-dwarf -c -o $i.l.o
done

clang++ {a,b,main}.l.o -o a.out

==> enum.h <==
#ifndef LTO_ENUM_H_
#define LTO_ENUM_H_

enum E { ea, eb, ec };

#endif  // LTO_ENUM_H_

==> main.cc <==
#include "enum.h"

void fa(E e);
void fb(E e);

int main() {
//  fa(E());
//  fb(E());
}

The only thing that's left in the compile units is the enum definition. In the original bugreport, the compile unit contains a bunch of enums plus one pointer type definition (uint8_t (*)[4]) and the things referenced by it. I guess the enums are there because there's no way to home them. I'm not sure what caused the other type to appear.

labath added a commit to labath/llvm-project that referenced this pull request Jul 25, 2024
I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see llvm#100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's go with it :)

@labath labath merged commit 0e4c7fa into llvm:main Aug 5, 2024
7 checks passed
labath added a commit that referenced this pull request Aug 12, 2024
I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see #100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
labath added a commit to labath/llvm-project that referenced this pull request Aug 14, 2024
…lvm#100577)"

The only change vs. the first version of the patch is that I've made
DWARFUnit linking thread-safe/unit. This was necessary because, during the
indexing step, two skeleton units could attempt to associate themselves
with the split unit.

The original commit message was:

I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see llvm#100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
labath added a commit that referenced this pull request Aug 15, 2024
…100577)" (#104041)

The only change vs. the first version of the patch is that I've made
DWARFUnit linking thread-safe/unit. This was necessary because, during
the
indexing step, two skeleton units could attempt to associate themselves
with the split unit.

The original commit message was:

I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see #100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.
adrian-prantl pushed a commit to adrian-prantl/llvm-project that referenced this pull request Oct 11, 2024
…lvm#100577)" (llvm#104041)

The only change vs. the first version of the patch is that I've made
DWARFUnit linking thread-safe/unit. This was necessary because, during
the
indexing step, two skeleton units could attempt to associate themselves
with the split unit.

The original commit message was:

I ran into this when LTO completely emptied two compile units, so they
ended up with the same hash (see llvm#100375). Although, ideally, the
compiler would try to ensure we don't end up with a hash collision even
in this case, guaranteeing their absence is practically impossible. This
patch ensures this situation does not bring down lldb.

(cherry picked from commit 57a19ac)
@labath labath deleted the dwo-hash branch November 8, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants