-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb] Fix a crash when using .dwp files and make type lookup reliable with the index cache #79544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lldb Author: Greg Clayton (clayborg) ChangesWhen using split DWARF with .dwp files we had an issue where sometimes the DWO file within the .dwp file would be parsed before the skeleton compile unit. The DWO file expects to be able to always be able to get a link back to the skeleton compile unit. Prior to this fix, the only time the skeleton compile unit backlink would get set, was if the unit headers for the main executable have been parsed and if the unit DIE was parsed in that DWARFUnit. This patch ensures that we can always get the skeleton compile unit for a DWO file by adding a function:
Prior to this fix DWARFUnit had some unsafe accessors that were used to store two different things:
This was used by SymbolFileDWARF to cache the
This will stop anyone from calling A crash could occur in
This is now fixed by calling the To implement the ability to get the skeleton compile units, I added code the DWARFDebugInfo.cpp/.h that make a map of DWO ID -> skeleton DWARFUnit * that gets filled in for DWARF5 when the unit headers are parsed. The There was also an issue that stopped type lookups from succeeding in When we have valid debug info indexes and when the lldb index cache was enabled, this would cause this issue to show up more often. I modified an existing test case to test that all of this works correctly and doesn't crash. Patch is 20.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79544.diff 9 Files Affected:
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
index 340b9acf80d023b..e59e328c6e7cbdb 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
@@ -81,27 +81,88 @@ void DWARFDebugInfo::ParseUnitsFor(DIERef::Section section) {
: m_context.getOrLoadDebugInfoData();
lldb::offset_t offset = 0;
while (data.ValidOffset(offset)) {
- llvm::Expected<DWARFUnitSP> unit_sp = DWARFUnit::extract(
+ llvm::Expected<DWARFUnitSP> expected_unit_sp = DWARFUnit::extract(
m_dwarf, m_units.size(), data, section, &offset);
- if (!unit_sp) {
+ if (!expected_unit_sp) {
// FIXME: Propagate this error up.
- llvm::consumeError(unit_sp.takeError());
+ llvm::consumeError(expected_unit_sp.takeError());
return;
}
+ DWARFUnitSP unit_sp = *expected_unit_sp;
+
// If it didn't return an error, then it should be returning a valid Unit.
- assert(*unit_sp);
- m_units.push_back(*unit_sp);
- offset = (*unit_sp)->GetNextUnitOffset();
+ assert((bool)unit_sp);
+
+ // Keep a map of DWO ID back to the skeleton units. Sometimes accelerator
+ // table lookups can cause the DWO files to be accessed before the skeleton
+ // compile unit is parsed, so we keep a map to allow us to match up the DWO
+ // file to the back to the skeleton compile units.
+ if (unit_sp->GetUnitType() == lldb_private::dwarf::DW_UT_skeleton) {
+ if (std::optional<uint64_t> unit_dwo_id = unit_sp->GetHeaderDWOId())
+ m_dwarf5_dwo_id_to_skeleton_unit[*unit_dwo_id] = unit_sp.get();
+ }
- if (auto *type_unit = llvm::dyn_cast<DWARFTypeUnit>(unit_sp->get())) {
+ m_units.push_back(unit_sp);
+ offset = unit_sp->GetNextUnitOffset();
+
+ if (auto *type_unit = llvm::dyn_cast<DWARFTypeUnit>(unit_sp.get())) {
m_type_hash_to_unit_index.emplace_back(type_unit->GetTypeHash(),
- unit_sp.get()->GetID());
+ unit_sp->GetID());
}
}
}
+DWARFUnit *DWARFDebugInfo::GetSkeletonUnit(DWARFUnit *dwo_unit) {
+ // If this isn't a DWO unit, don't try and find the skeleton unit.
+ if (!dwo_unit->IsDWOUnit())
+ return nullptr;
+
+ auto dwo_id = dwo_unit->GetDWOId();
+ if (!dwo_id.has_value())
+ return nullptr;
+
+ // Parse the unit headers so that m_dwarf5_dwo_id_to_skeleton_unit is filled
+ // in with all of the DWARF5 skeleton compile units DWO IDs since it is easy
+ // to access the DWO IDs in the DWARFUnitHeader for each DWARFUnit.
+ ParseUnitHeadersIfNeeded();
+
+ // Find the value in our cache and return it we we find it. This cache may
+ // only contain DWARF5 units.
+ auto iter = m_dwarf5_dwo_id_to_skeleton_unit.find(*dwo_id);
+ if (iter != m_dwarf5_dwo_id_to_skeleton_unit.end())
+ return iter->second;
+
+ // DWARF5 unit headers have the DWO ID and should have already been in the map
+ // so if it wasn't found in the above find() call, then we didn't find it and
+ // don't need to do the more expensive DWARF4 search.
+ if (dwo_unit->GetVersion() >= 5)
+ return nullptr;
+
+ // Parse all DWO IDs from all DWARF4 and earlier compile units that have DWO
+ // IDs. It is more expensive to get the DWO IDs from DWARF4 compile units as
+ // we need to parse the unit DIE and extract the DW_AT_dwo_id or
+ // DW_AT_GNU_dwo_id attribute values, so do this only if we didn't find our
+ // match above search and only for DWARF4 and earlier compile units.
+ llvm::call_once(m_dwarf4_dwo_id_to_skeleton_unit_once_flag, [this]() {
+ for (uint32_t i = 0, num = GetNumUnits(); i < num; ++i) {
+ if (DWARFUnit *unit = GetUnitAtIndex(i)) {
+ if (unit->GetVersion() < 5) {
+ if (std::optional<uint64_t> unit_dwo_id = unit->GetDWOId())
+ m_dwarf4_dwo_id_to_skeleton_unit[*unit_dwo_id] = unit;
+ }
+ }
+ }
+ });
+
+ // Search the DWARF4 DWO results that we parsed lazily.
+ iter = m_dwarf4_dwo_id_to_skeleton_unit.find(*dwo_id);
+ if (iter != m_dwarf4_dwo_id_to_skeleton_unit.end())
+ return iter->second;
+ return nullptr;
+}
+
void DWARFDebugInfo::ParseUnitHeadersIfNeeded() {
llvm::call_once(m_units_once_flag, [&] {
ParseUnitsFor(DIERef::Section::DebugInfo);
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
index a8b5abc3beed2d0..c1f0cb0203fb760 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
@@ -39,6 +39,7 @@ class DWARFDebugInfo {
DWARFUnit *GetUnitContainingDIEOffset(DIERef::Section section,
dw_offset_t die_offset);
DWARFUnit *GetUnit(const DIERef &die_ref);
+ DWARFUnit *GetSkeletonUnit(DWARFUnit *dwo_unit);
DWARFTypeUnit *GetTypeUnitForHash(uint64_t hash);
bool ContainsTypeUnits();
DWARFDIE GetDIE(const DIERef &die_ref);
@@ -70,6 +71,9 @@ class DWARFDebugInfo {
m_cu_aranges_up; // A quick address to compile unit table
std::vector<std::pair<uint64_t, uint32_t>> m_type_hash_to_unit_index;
+ llvm::DenseMap<uint64_t, DWARFUnit *> m_dwarf5_dwo_id_to_skeleton_unit;
+ llvm::DenseMap<uint64_t, DWARFUnit *> m_dwarf4_dwo_id_to_skeleton_unit;
+ llvm::once_flag m_dwarf4_dwo_id_to_skeleton_unit_once_flag;
private:
// All parsing needs to be done partially any managed by this class as
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
index 7a40361cdede7a7..23e0b8a7f2c06bd 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
@@ -97,7 +97,12 @@ void DWARFUnit::ExtractUnitDIEIfNeeded() {
*m_dwo_id, m_first_die.GetOffset()));
return; // Can't fetch the compile unit from the dwo file.
}
- dwo_cu->SetUserData(this);
+ // If the skeleton compile unit gets its unit DIE parsed first, then this
+ // will fill in the DWO file's back pointer to this skeleton compile unit.
+ // If the DWO files get parsed on their own first the skeleton back link
+ // can be done manually in DWARFUnit::GetSkeletonCompileUnit() which will
+ // do a reverse lookup and cache the result.
+ dwo_cu->SetSkeletonUnit(this);
DWARFBaseDIE dwo_cu_die = dwo_cu->GetUnitDIEOnly();
if (!dwo_cu_die.IsValid()) {
@@ -702,9 +707,25 @@ uint8_t DWARFUnit::GetAddressByteSize(const DWARFUnit *cu) {
uint8_t DWARFUnit::GetDefaultAddressSize() { return 4; }
-void *DWARFUnit::GetUserData() const { return m_user_data; }
+DWARFCompileUnit *DWARFUnit::GetSkeletonUnit() {
+ if (m_skeleton_unit == nullptr && IsDWOUnit()) {
+ SymbolFileDWARFDwo *dwo =
+ llvm::dyn_cast_or_null<SymbolFileDWARFDwo>(&GetSymbolFileDWARF());
+ // Do a reverse lookup if the skeleton compile unit wasn't set.
+ if (dwo)
+ m_skeleton_unit = dwo->GetBaseSymbolFile().GetSkeletonUnit(this);
+ }
+ return llvm::dyn_cast_or_null<DWARFCompileUnit>(m_skeleton_unit);
+}
-void DWARFUnit::SetUserData(void *d) { m_user_data = d; }
+void DWARFUnit::SetSkeletonUnit(DWARFUnit *skeleton_unit) {
+ // If someone is re-setting the skeleton compile unit backlink, make sure
+ // it is setting it to a valid value when it wasn't valid, or if the
+ // value in m_skeleton_unit was valid, it should be the same value.
+ assert(skeleton_unit);
+ assert(m_skeleton_unit == nullptr || m_skeleton_unit == skeleton_unit);
+ m_skeleton_unit = skeleton_unit;
+}
bool DWARFUnit::Supports_DW_AT_APPLE_objc_complete_type() {
return GetProducer() != eProducerLLVMGCC;
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
index bc225a52e1d0309..9f6d127056fa562 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
@@ -98,8 +98,14 @@ class DWARFUnit : public UserID {
virtual ~DWARFUnit();
bool IsDWOUnit() { return m_is_dwo; }
+ /// Get the DWO ID from the DWARFUnitHeader for DWARF5, or from the unit DIE's
+ /// DW_AT_dwo_id or DW_AT_GNU_dwo_id for DWARF4 and earlier.
std::optional<uint64_t> GetDWOId();
-
+ /// Get the DWO ID from the DWARFUnitHeader only. DWARF5 skeleton units have
+ /// the DWO ID in the compile unit header and we sometimes only want to access
+ /// this cheap value without causing the more expensive attribute fetches that
+ /// GetDWOId() uses.
+ std::optional<uint64_t> GetHeaderDWOId() { return m_header.GetDWOId(); }
void ExtractUnitDIEIfNeeded();
void ExtractUnitDIENoDwoIfNeeded();
void ExtractDIEsIfNeeded();
@@ -198,9 +204,21 @@ class DWARFUnit : public UserID {
static uint8_t GetDefaultAddressSize();
- void *GetUserData() const;
+ lldb_private::CompileUnit *GetLLDBCompUnit() const { return m_lldb_cu; }
+
+ void SetLLDBCompUnit(lldb_private::CompileUnit *cu) { m_lldb_cu = cu; }
+
+ /// Get the skeleton compile unit for a DWO file.
+ ///
+ /// We need to keep track of the skeleton compile unit for a DWO file so
+ /// we can access it. Sometimes this value is cached when the skeleton
+ /// compile unit is first parsed, but if a .dwp file parses all of the
+ /// DWARFUnits in the file, the skeleton compile unit might not have been
+ /// parsed yet, to there might not be a backlink. This accessor handles
+ /// both cases correctly and avoids crashes.
+ DWARFCompileUnit *GetSkeletonUnit();
- void SetUserData(void *d);
+ void SetSkeletonUnit(DWARFUnit *skeleton_unit);
bool Supports_DW_AT_APPLE_objc_complete_type();
@@ -336,7 +354,9 @@ class DWARFUnit : public UserID {
std::shared_ptr<DWARFUnit> m_dwo;
DWARFUnitHeader m_header;
const llvm::DWARFAbbreviationDeclarationSet *m_abbrevs = nullptr;
- void *m_user_data = nullptr;
+ lldb_private::CompileUnit *m_lldb_cu = nullptr;
+ // If this is a DWO file, we have a backlink to our skeleton compile unit.
+ DWARFUnit *m_skeleton_unit = nullptr;
// The compile unit debug information entry item
DWARFDebugInfoEntry::collection m_die_array;
mutable llvm::sys::RWMutex m_die_array_mutex;
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index fed97858c83f8e2..431357da0eb5c0f 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -722,8 +722,8 @@ DWARFCompileUnit *SymbolFileDWARF::GetDWARFCompileUnit(CompileUnit *comp_unit) {
// The compile unit ID is the index of the DWARF unit.
DWARFUnit *dwarf_cu = DebugInfo().GetUnitAtIndex(comp_unit->GetID());
- if (dwarf_cu && dwarf_cu->GetUserData() == nullptr)
- dwarf_cu->SetUserData(comp_unit);
+ if (dwarf_cu && dwarf_cu->GetLLDBCompUnit() == nullptr)
+ dwarf_cu->SetLLDBCompUnit(comp_unit);
// It must be DWARFCompileUnit when it created a CompileUnit.
return llvm::cast_or_null<DWARFCompileUnit>(dwarf_cu);
@@ -771,7 +771,7 @@ static const char *GetDWOName(DWARFCompileUnit &dwarf_cu,
lldb::CompUnitSP SymbolFileDWARF::ParseCompileUnit(DWARFCompileUnit &dwarf_cu) {
CompUnitSP cu_sp;
- CompileUnit *comp_unit = (CompileUnit *)dwarf_cu.GetUserData();
+ CompileUnit *comp_unit = dwarf_cu.GetLLDBCompUnit();
if (comp_unit) {
// We already parsed this compile unit, had out a shared pointer to it
cu_sp = comp_unit->shared_from_this();
@@ -779,7 +779,7 @@ lldb::CompUnitSP SymbolFileDWARF::ParseCompileUnit(DWARFCompileUnit &dwarf_cu) {
if (GetDebugMapSymfile()) {
// Let the debug map create the compile unit
cu_sp = m_debug_map_symfile->GetCompileUnit(this, dwarf_cu);
- dwarf_cu.SetUserData(cu_sp.get());
+ dwarf_cu.SetLLDBCompUnit(cu_sp.get());
} else {
ModuleSP module_sp(m_objfile_sp->GetModule());
if (module_sp) {
@@ -792,7 +792,7 @@ lldb::CompUnitSP SymbolFileDWARF::ParseCompileUnit(DWARFCompileUnit &dwarf_cu) {
*GetDWARFUnitIndex(dwarf_cu.GetID()), cu_language,
eLazyBoolCalculate, std::move(support_files));
- dwarf_cu.SetUserData(cu_sp.get());
+ dwarf_cu.SetLLDBCompUnit(cu_sp.get());
SetCompileUnitAtIndex(dwarf_cu.GetID(), cu_sp);
};
@@ -1675,20 +1675,20 @@ Type *SymbolFileDWARF::ResolveType(const DWARFDIE &die,
CompileUnit *
SymbolFileDWARF::GetCompUnitForDWARFCompUnit(DWARFCompileUnit &dwarf_cu) {
+
if (dwarf_cu.IsDWOUnit()) {
- DWARFCompileUnit *non_dwo_cu =
- static_cast<DWARFCompileUnit *>(dwarf_cu.GetUserData());
+ DWARFCompileUnit *non_dwo_cu = dwarf_cu.GetSkeletonUnit();
assert(non_dwo_cu);
return non_dwo_cu->GetSymbolFileDWARF().GetCompUnitForDWARFCompUnit(
*non_dwo_cu);
}
// Check if the symbol vendor already knows about this compile unit?
- if (dwarf_cu.GetUserData() == nullptr) {
- // The symbol vendor doesn't know about this compile unit, we need to parse
- // and add it to the symbol vendor object.
- return ParseCompileUnit(dwarf_cu).get();
- }
- return static_cast<CompileUnit *>(dwarf_cu.GetUserData());
+ CompileUnit *lldb_cu = dwarf_cu.GetLLDBCompUnit();
+ if (lldb_cu)
+ return lldb_cu;
+ // The symbol vendor doesn't know about this compile unit, we need to parse
+ // and add it to the symbol vendor object.
+ return ParseCompileUnit(dwarf_cu).get();
}
void SymbolFileDWARF::GetObjCMethods(
@@ -1750,7 +1750,7 @@ SymbolFileDWARF::GetDIE(const DIERef &die_ref) {
}
if (*file_index == DIERef::k_file_index_mask)
- symbol_file = m_dwp_symfile.get(); // DWP case
+ symbol_file = GetDwpSymbolFile().get(); // DWP case
else
symbol_file = this->DebugInfo()
.GetUnitAtIndex(*die_ref.file_index())
@@ -1785,6 +1785,10 @@ std::optional<uint64_t> SymbolFileDWARF::GetDWOId() {
return {};
}
+DWARFUnit *SymbolFileDWARF::GetSkeletonUnit(DWARFUnit *dwo_unit) {
+ return DebugInfo().GetSkeletonUnit(dwo_unit);
+}
+
std::shared_ptr<SymbolFileDWARFDwo>
SymbolFileDWARF::GetDwoSymbolFileForCompileUnit(
DWARFUnit &unit, const DWARFDebugInfoEntry &cu_die) {
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 26a9502f90aa006..ed4b96a514e6c0b 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
@@ -250,6 +250,17 @@ class SymbolFileDWARF : public SymbolFileCommon {
/// If this is a DWARF object with a single CU, return its DW_AT_dwo_id.
std::optional<uint64_t> GetDWOId();
+ /// Given a DWO DWARFUnit, find the corresponding skeleton DWARFUnit
+ /// in the main symbol file. DWP files can have their DWARFUnits
+ /// parsed without the skeleton compile units having been parsed, so
+ /// sometimes we need to find the skeleton compile unit for a DWO
+ /// DWARFUnit so we can fill in this link. Currently unless the
+ /// skeleton compile unit has been parsed _and_ the Unit DIE has been
+ /// parsed, the DWO unit will not have a backward link setup correctly
+ /// which was causing crashes due to an assertion that was firing
+ /// in SymbolFileDWARF::GetCompUnitForDWARFCompUnit().
+ DWARFUnit *GetSkeletonUnit(DWARFUnit *dwo_unit);
+
static bool DIEInDeclContext(const CompilerDeclContext &parent_decl_ctx,
const DWARFDIE &die,
bool only_root_namespaces = false);
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp
index ca698a84a9146da..a3dc80eaa056ac3 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp
@@ -149,3 +149,22 @@ void SymbolFileDWARFDwo::FindGlobalVariables(
GetBaseSymbolFile().FindGlobalVariables(name, parent_decl_ctx, max_matches,
variables);
}
+
+bool SymbolFileDWARFDwo::GetDebugInfoIndexWasLoadedFromCache() const {
+ return GetBaseSymbolFile().GetDebugInfoIndexWasLoadedFromCache();
+}
+void SymbolFileDWARFDwo::SetDebugInfoIndexWasLoadedFromCache() {
+ GetBaseSymbolFile().SetDebugInfoIndexWasLoadedFromCache();
+}
+bool SymbolFileDWARFDwo::GetDebugInfoIndexWasSavedToCache() const {
+ return GetBaseSymbolFile().GetDebugInfoIndexWasSavedToCache();
+}
+void SymbolFileDWARFDwo::SetDebugInfoIndexWasSavedToCache() {
+ GetBaseSymbolFile().SetDebugInfoIndexWasSavedToCache();
+}
+bool SymbolFileDWARFDwo::GetDebugInfoHadFrameVariableErrors() const {
+ return GetBaseSymbolFile().GetDebugInfoHadFrameVariableErrors();
+}
+void SymbolFileDWARFDwo::SetDebugInfoHadFrameVariableErrors() {
+ return GetBaseSymbolFile().SetDebugInfoHadFrameVariableErrors();
+}
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h
index 9f5950e51b0c180..d35680119e14f18 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h
@@ -56,6 +56,15 @@ class SymbolFileDWARFDwo : public SymbolFileDWARF {
uint32_t max_matches,
VariableList &variables) override;
+ SymbolFileDWARF &GetBaseSymbolFile() const { return m_base_symbol_file; }
+
+ bool GetDebugInfoIndexWasLoadedFromCache() const override;
+ void SetDebugInfoIndexWasLoadedFromCache() override;
+ bool GetDebugInfoIndexWasSavedToCache() const override;
+ void SetDebugInfoIndexWasSavedToCache() override;
+ bool GetDebugInfoHadFrameVariableErrors() const override;
+ void SetDebugInfoHadFrameVariableErrors() override;
+
protected:
DIEToTypePtr &GetDIEToType() override;
@@ -75,8 +84,6 @@ class SymbolFileDWARFDwo : public SymbolFileDWARF {
ConstString type_name,
bool must_be_implementation) override;
- SymbolFileDWARF &GetBaseSymbolFile() const { return m_base_symbol_file; }
-
/// If this file contains exactly one compile unit, this function will return
/// it. Otherwise it returns nullptr.
DWARFCompileUnit *FindSingleCompileUnit();
diff --git a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
index cda2992604511fb..78476ae8d99cd3d 100644
--- a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp
@@ -8,8 +8,46 @@
// RUN: llvm-objcopy --strip-all --add-gnu-debuglink=%t.debug %t
// RUN: %lldb %t -o "target variable a" -b | FileCheck %s
+// Run one time with the index cache enabled to populate the index cache. When
+// we populate the index cache we have to parse all of the DWARF debug info
+// and it is always available.
+// RUN: rm -rf %T/lldb-index-cache
+// RUN: %lldb \
+// RUN: -O 'settings set symbols.enable-lldb-index-cache true' \
+// RUN: -O 'settings set symbols.lldb-index-cache-path %T/lldb-index-cache' \
+// RUN: -O 'settings set target.preload-symbols false' \
+// RUN: -o "script lldb.target.modules[0].FindTypes('::A').GetTypeAtIndex(0)" \
+// RUN: -o "statistics dump" \
+// RUN: %t -b | FileCheck %s -check-prefix=CACHE
+
+// Run again after index cache was enabled, which load the index cache. When we
+// load the index cache from disk, we don't have any DWARF parsed yet and this
+// can cause us to try and access information in the .dwp directly without
+// parsing the .debug_info, but this caused crashes when the DWO files didn't
+// have a backlink to the skeleton compile unit. This test verifies that we
+// don't crash and that we can find types when using .dwp files.
+// RUN: %lldb \
+// RUN: -O 'settings set symbols.enable-lldb-index-cache true' \
+// RUN: -O 'settings set symbols.lldb-index-cache-path %T/lldb-index-cache' \
+// RUN: -O 'settings set target.preload-symbols false' \
+// RUN: -o "script lldb.target.modules[0].FindTypes('::A').GetTypeAtIndex(0)" \
+// RUN: -o "statistics dump" \
+// RUN: %t -b | FileCheck %s -check-prefix=CACHED
+
// CHECK: (A) a = (x = 47)
+// CACHE: script lldb.target.modules[0].FindTypes('::A').GetTy...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff 5d228eaf0f5d9c873ba12fc439609148f3f88733 025eca94790a100dab1f17297c39590bf9658517 -- lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.cpp lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARFDwo.h lldb/test/Shell/SymbolFile/DWARF/x86/dwp-separate-debug-file.cpp View the diff from clang-format here.diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
index 6bcf95c11a..7bd58f5ae5 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
@@ -89,7 +89,8 @@ void DWARFDebugInfo::ParseUnitsFor(DIERef::Section section) {
if (!expected_unit_sp) {
Log *log = GetLog(DWARFLog::DebugInfo);
LLDB_LOG(log, "Unable to extract DWARFUnitHeader at {0:x}: {1}",
- unit_header_offset, llvm::toString(expected_unit_sp.takeError()));
+ unit_header_offset,
+ llvm::toString(expected_unit_sp.takeError()));
return;
}
|
I'm not following all of this, but it appears to be based on the premise that it's OK that sometimes split units inside a DWP file are parsed before their skeleton unit? Why is that OK/when/why/where is that happening? I'd think any case where that happens would be a performance (& possibly correctness bug) & it'd be better to maintain the invariant that the only way you ever parse a split unit is starting from the skeleton - rather than adding maps/etc to be able to backtrack to parsing the skeleton when you already have the split unit. |
When we have accelerator tables from lldb index cache being loaded from disk, we have a DIERef object for each matching name. This DIERef objects has a magic file index that specifies that this is from the .dwp file. When we do a type lookup and get these kinds of entries, we go straight to the
That might mean that anytime we use any DWARFDie, we will need to add functionality that checks with the DWARFUnit to verify if it is a .dwo unit, and if so, make sure its skeleton unit has been parsed. The current solution allows anyone to just work with the existing APIs and they will do the right thing and will only grab the backlink if it isn't already set. With DWARF5 it is cheap as the info is in the unit headers. Without this solution we will have to always parse the skeleton unit headers for the main executable and need to parse at least the first DIE in order to fill in the .dwo file's skeleton backlink pointer. So this can cause us to pull in more DWARF during type lookups. If you are really against this approach I can search for a solution that avoids the maps, but I fear more hidden bugs will arise in the future without an approach similar to what this PR does. The current bugs either crashed the debug session or stopped type lookups from working. |
Even if we want to have the skeleton compile unit be parsed first, we would need to know this from any accelerator table entry, wether it be from .debug_names or from our internal manual index. Our internal manual index creates a DIERef object with a magic .dwp file index + a DIE offset within the .debug_info.dwo in the dwp file. So given only this information, if we wanted to parse the skeleton unit first, we would need to ask the main executable to find the skeleton unit for a given DWO ID first, which would require a search and a map to efficiently grab the right skeleton unit. So either way we will need the map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level makes sense.
Can you add test to ensure both dwarf4 & dwarf5 work? Also the modified test covers dwp, have you tested dwo without dwp scenario for completeness?
} | ||
} | ||
} | ||
|
||
DWARFUnit *DWARFDebugInfo::GetSkeletonUnit(DWARFUnit *dwo_unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe return a llvm::optional<DWARFUnit>
instead of raw pointer to force caller to handle null case explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NULL pointer already means it isn't availble. Not sure what making it optional would benefit?
if (dwarf_cu.IsDWOUnit()) { | ||
DWARFCompileUnit *non_dwo_cu = | ||
static_cast<DWARFCompileUnit *>(dwarf_cu.GetUserData()); | ||
DWARFCompileUnit *non_dwo_cu = dwarf_cu.GetSkeletonUnit(); | ||
assert(non_dwo_cu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explain why this assertion can hold - why non_dwo_cu
can't be null.
True enough, but I think letting this become a lazy property post-construction is a bit more problematic as it might lead to more complicated invariants/situations further downstream. By making it a precondition that any split unit, on construction, has the association with the skeleton unit - there would be fewer "interesting" situations later where the lookup occurs when someone isn't expecting it. The split unit isn't useful without the skeleton for addresses, etc, so it's not like delaying the skeleton lookup provides better performance, etc.
(side note: it'd be great if the internal manual index was closer to .debug_names (especially the serialized version - it'd be good if that /was/ .debug_names) - to ensure consistency (& at least last I checked the cached manual index - the debugger startup time was impacted by loading that index off-disk, when the .debug_names/.apple_names indexes don't need to be loaded from disk) & fewer support paths/less code to maintain) Presumably the entry would still need to mention which CU in the .dwp file it comes from - and presumably the non-split entries in this index would also mention which CU they come from (by offset in the .debug_info section)? (because reading a DIE without knowing wihch CU it's in isn't helpful - you need CU properties, etc) (but I'm not an lldb maintainer, so can take all this with a few grains of salt) |
It actually does provide better performance as we will often do the type lookup solely in the .dwp file and determine we don't need to actually parse and return the type because the context doesn't match. There is no need for the skeleton unit and the address tables and lines tables in the type lookup case. It allows us to traverse the DIE tree in the .dwp file without ever needing to possibly do an expensive lookup by DWO ID for DWARF4, when it actually isn't needed. We have an accessor to allow us to get the skeleton unit if and only if we need it, and yes, most times it will be filled in already. But when it isn't we can easily access it.
It would be really nice to use .debug_names and I welcome a patch that implements this. I don't have time for this at the moment, but it would be great if anyone else has some time to make this happen.
It is a great idea, and it was my original thought when I was saving the index to disk. The APIs for generating the .debug_names were hard to adapt over into the LLDB side of things, and I didn't want to have the LLVM DWARF parser have to parse the DWARF in order to get that to work, so I opted to not implement it that way to make sure we didn't double parse the DWARF. At some point if we look at the performance of the LLVM DWARF parser (.debug_info parser) and it performs as well or better than LLDB's, we can switch over to it. I am also worried about asserts and the coding style differences where invariants cause crashes in the LLVM parser vs LLDB's parser being more accepting of bad input. We do use the LLVM .debug_line parser already and ran into such a crasher last week when a BOLT generated binary had an invalid DW_AT_stmt_list offset and it is crashing LLDB because of this kind of an assert followed by a crash (when asserts aren't enabled it crashes). |
I can add a dwarf4 test, yes. |
Huh, fair enough - bit surprising, but it works. Still worried about doing things a bit too lazily, more complicated states to deal with through the debugger, but fair enough. Thanks for walking me through it! |
|
||
if (!unit_sp) { | ||
if (!expected_unit_sp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could at least LLDB_LOG the error into the dwarf channel?
// Run one time with the index cache enabled to populate the index cache. When | ||
// we populate the index cache we have to parse all of the DWARF debug info | ||
// and it is always available. | ||
// RUN: rm -rf %T/lldb-index-cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%T is the parent dir of %t. That sounds dangerous because multiple parallel tests could write an lldb-index-cache there and create nondeterministic result. Could you move the index cache into a test-specific dir like mkdir -p %t
?
I think I have taken care of everyone's requests, let me know if there is more to do! |
…the index cache. When using split DWARF with .dwp files we had an issue where sometimes the DWO file within the .dwp file would be parsed _before_ the skeleton compile unit. The DWO file expects to be able to always be able to get a link back to the skeleton compile unit. Prior to this fix, the only time the skeleton compile unit backlink would get set, was if the unit headers for the main executable have been parsed _and_ if the unit DIE was parsed in that DWARFUnit. This patch ensures that we can always get the skeleton compile unit for a DWO file by adding a function: ``` DWARFCompileUnit *DWARFUnit::GetSkeletonUnit(); ``` Prior to this fix DWARFUnit had some unsafe accessors that were used to store two different things: ``` void *DWARFUnit::GetUserData() const; void DWARFUnit::SetUserData(void *d); ``` This was used by SymbolFileDWARF to cache the `lldb_private::CompileUnit *` for a SymbolFileDWARF and was also used to store the `DWARFUnit *` for SymbolFileDWARFDwo. This patch clears up this unsafe usage by adding two separate accessors and ivars for this: ``` lldb_private::CompileUnit *DWARFUnit::GetLLDBCompUnit() const { return m_lldb_cu; } void DWARFUnit::SetLLDBCompUnit(lldb_private::CompileUnit *cu) { m_lldb_cu = cu; } DWARFCompileUnit *DWARFUnit::GetSkeletonUnit(); void DWARFUnit::SetSkeletonUnit(DWARFUnit *skeleton_unit); ``` This will stop anyone from calling `void *DWARFUnit::GetUserData() const;` and casting the value to an incorrect value. A crash could occur in `SymbolFileDWARF::GetCompUnitForDWARFCompUnit()` when the `non_dwo_cu`, which is a backlink to the skeleton compile unit, was not set and was NULL. There is an assert() in the code, and then the code just will kill the program if the assert isn't enabled because the code looked like: ``` if (dwarf_cu.IsDWOUnit()) { DWARFCompileUnit *non_dwo_cu = static_cast<DWARFCompileUnit *>(dwarf_cu.GetUserData()); assert(non_dwo_cu); return non_dwo_cu->GetSymbolFileDWARF().GetCompUnitForDWARFCompUnit( *non_dwo_cu); } ``` This is now fixed by calling the `DWARFUnit::GetSkeletonUnit()` which will correctly always get the skeleton compile uint for a DWO file regardless of if the skeleton unit headers have been parse or if the skeleton unit DIE wasn't parsed yet. To implement the ability to get the skeleton compile units, I added code the DWARFDebugInfo.cpp/.h that make a map of DWO ID -> skeleton DWARFUnit * that gets filled in for DWARF5 when the unit headers are parsed. The `DWARFUnit::GetSkeletonUnit()` will end up parsing the unit headers of the main executable to fill in this map if it already hasn't been done. For DWARF4 and earlier we maintain a separate map that gets filled in only for any DWARF4 compile units that have a DW_AT_dwo_id or DW_AT_gnu_dwo_id attributes. This is more expensive, so this is done lazily and in a thread safe manor. This allows us to be as efficient as possible when using DWARF5 and also be backward compatible with DWARF4 + split DWARF. There was also an issue that stopped type lookups from succeeding in `DWARFDIE SymbolFileDWARF::GetDIE(const DIERef &die_ref)` where it directly was accessing the `m_dwp_symfile` ivar without calling the accessor function that could end up needing to locate and load the .dwp file. This was fixed by calling the `SymbolFileDWARF::GetDwpSymbolFile()` accessor to ensure we always get a valid value back if we can find the .dwp file. Prior to this fix it was down which APIs were called and if any APIs were called that loaded the .dwp file, it worked fine, but it might not if no APIs were called that did cause it to get loaded. When we have valid debug info indexes and when the lldb index cache was enabled, this would cause this issue to show up more often. I modified an existing test case to test that all of this works correctly and doesn't crash.
Changes: - make the llvm-index-cache directory based off of %t to ensure uniqueness - log to DWARF debug info channel when we fail to parse DWARFUnitHeaders
7f7296f
to
025eca9
Compare
This broke the buildbots. Fixed with 4eac146
|
…e with the index cache (llvm#79544) When using split DWARF with .dwp files we had an issue where sometimes the DWO file within the .dwp file would be parsed _before_ the skeleton compile unit. The DWO file expects to be able to always be able to get a link back to the skeleton compile unit. Prior to this fix, the only time the skeleton compile unit backlink would get set, was if the unit headers for the main executable have been parsed _and_ if the unit DIE was parsed in that DWARFUnit. This patch ensures that we can always get the skeleton compile unit for a DWO file by adding a function: ``` DWARFCompileUnit *DWARFUnit::GetSkeletonUnit(); ``` Prior to this fix DWARFUnit had some unsafe accessors that were used to store two different things: ``` void *DWARFUnit::GetUserData() const; void DWARFUnit::SetUserData(void *d); ``` This was used by SymbolFileDWARF to cache the `lldb_private::CompileUnit *` for a SymbolFileDWARF and was also used to store the `DWARFUnit *` for SymbolFileDWARFDwo. This patch clears up this unsafe usage by adding two separate accessors and ivars for this: ``` lldb_private::CompileUnit *DWARFUnit::GetLLDBCompUnit() const { return m_lldb_cu; } void DWARFUnit::SetLLDBCompUnit(lldb_private::CompileUnit *cu) { m_lldb_cu = cu; } DWARFCompileUnit *DWARFUnit::GetSkeletonUnit(); void DWARFUnit::SetSkeletonUnit(DWARFUnit *skeleton_unit); ``` This will stop anyone from calling `void *DWARFUnit::GetUserData() const;` and casting the value to an incorrect value. A crash could occur in `SymbolFileDWARF::GetCompUnitForDWARFCompUnit()` when the `non_dwo_cu`, which is a backlink to the skeleton compile unit, was not set and was NULL. There is an assert() in the code, and then the code just will kill the program if the assert isn't enabled because the code looked like: ``` if (dwarf_cu.IsDWOUnit()) { DWARFCompileUnit *non_dwo_cu = static_cast<DWARFCompileUnit *>(dwarf_cu.GetUserData()); assert(non_dwo_cu); return non_dwo_cu->GetSymbolFileDWARF().GetCompUnitForDWARFCompUnit( *non_dwo_cu); } ``` This is now fixed by calling the `DWARFUnit::GetSkeletonUnit()` which will correctly always get the skeleton compile uint for a DWO file regardless of if the skeleton unit headers have been parse or if the skeleton unit DIE wasn't parsed yet. To implement the ability to get the skeleton compile units, I added code the DWARFDebugInfo.cpp/.h that make a map of DWO ID -> skeleton DWARFUnit * that gets filled in for DWARF5 when the unit headers are parsed. The `DWARFUnit::GetSkeletonUnit()` will end up parsing the unit headers of the main executable to fill in this map if it already hasn't been done. For DWARF4 and earlier we maintain a separate map that gets filled in only for any DWARF4 compile units that have a DW_AT_dwo_id or DW_AT_gnu_dwo_id attributes. This is more expensive, so this is done lazily and in a thread safe manor. This allows us to be as efficient as possible when using DWARF5 and also be backward compatible with DWARF4 + split DWARF. There was also an issue that stopped type lookups from succeeding in `DWARFDIE SymbolFileDWARF::GetDIE(const DIERef &die_ref)` where it directly was accessing the `m_dwp_symfile` ivar without calling the accessor function that could end up needing to locate and load the .dwp file. This was fixed by calling the `SymbolFileDWARF::GetDwpSymbolFile()` accessor to ensure we always get a valid value back if we can find the .dwp file. Prior to this fix it was down which APIs were called and if any APIs were called that loaded the .dwp file, it worked fine, but it might not if no APIs were called that did cause it to get loaded. When we have valid debug info indexes and when the lldb index cache was enabled, this would cause this issue to show up more often. I modified an existing test case to test that all of this works correctly and doesn't crash.
When using split DWARF with .dwp files we had an issue where sometimes the DWO file within the .dwp file would be parsed before the skeleton compile unit. The DWO file expects to be able to always be able to get a link back to the skeleton compile unit. Prior to this fix, the only time the skeleton compile unit backlink would get set, was if the unit headers for the main executable have been parsed and if the unit DIE was parsed in that DWARFUnit. This patch ensures that we can always get the skeleton compile unit for a DWO file by adding a function:
Prior to this fix DWARFUnit had some unsafe accessors that were used to store two different things:
This was used by SymbolFileDWARF to cache the
lldb_private::CompileUnit *
for a SymbolFileDWARF and was also used to store theDWARFUnit *
for SymbolFileDWARFDwo. This patch clears up this unsafe usage by adding two separate accessors and ivars for this:This will stop anyone from calling
void *DWARFUnit::GetUserData() const;
and casting the value to an incorrect value.A crash could occur in
SymbolFileDWARF::GetCompUnitForDWARFCompUnit()
when thenon_dwo_cu
, which is a backlink to the skeleton compile unit, was not set and was NULL. There is an assert() in the code, and then the code just will kill the program if the assert isn't enabled because the code looked like:This is now fixed by calling the
DWARFUnit::GetSkeletonUnit()
which will correctly always get the skeleton compile uint for a DWO file regardless of if the skeleton unit headers have been parse or if the skeleton unit DIE wasn't parsed yet.To implement the ability to get the skeleton compile units, I added code the DWARFDebugInfo.cpp/.h that make a map of DWO ID -> skeleton DWARFUnit * that gets filled in for DWARF5 when the unit headers are parsed. The
DWARFUnit::GetSkeletonUnit()
will end up parsing the unit headers of the main executable to fill in this map if it already hasn't been done. For DWARF4 and earlier we maintain a separate map that gets filled in only for any DWARF4 compile units that have a DW_AT_dwo_id or DW_AT_gnu_dwo_id attributes. This is more expensive, so this is done lazily and in a thread safe manor. This allows us to be as efficient as possible when using DWARF5 and also be backward compatible with DWARF4 + split DWARF.There was also an issue that stopped type lookups from succeeding in
DWARFDIE SymbolFileDWARF::GetDIE(const DIERef &die_ref)
where it directly was accessing them_dwp_symfile
ivar without calling the accessor function that could end up needing to locate and load the .dwp file. This was fixed by calling theSymbolFileDWARF::GetDwpSymbolFile()
accessor to ensure we always get a valid value back if we can find the .dwp file. Prior to this fix it was down which APIs were called and if any APIs were called that loaded the .dwp file, it worked fine, but it might not if no APIs were called that did cause it to get loaded.When we have valid debug info indexes and when the lldb index cache was enabled, this would cause this issue to show up more often.
I modified an existing test case to test that all of this works correctly and doesn't crash.