Skip to content

Commit 07c215e

Browse files
committed
Fix shared library loading when users define duplicate _r_debug structure.
We ran into a case where shared libraries would fail to load in some processes on linux. The issue turned out to be if the main executable or a shared library defined a symbol named "_r_debug", then it would cause problems once the executable that contained it was loaded into the process. The "_r_debug" structure is currently found by looking through the .dynamic section in the main executable and finding the DT_DEBUG entry which points to this structure. The dynamic loader will update this structure as shared libraries are loaded and LLDB watches the contents of this structure as the dyld breakpoint is hit. Currently we expect the "state" in this structure to change as things happen. An issue comes up if someone defines another "_r_debug" struct in their program: ``` r_debug _r_debug; ``` If this code is included, a new "_r_debug" structure is created and it causes problems once the executable is loaded. This is because of the way symbol lookups happen in linux: they use the shared library list in the order it created and the dynamic loader is always last. So at some point the dynamic loader will start updating this other copy of "_r_debug", yet LLDB is only watching the copy inside of the dynamic loader. Steps that show the problem are: - lldb finds the "_r_debug" structure via the DT_DEBUG entry in the .dynamic section and this points to the "_r_debug" in ld.so - ld.so modifies its copy of "_r_debug" with "state = eAdd" before it loads the shared libraries and calls the dyld function that LLDB has set a breakpoint on and we find this state and do nothing (we are waiting for a state of eConsistent to tell us the shared libraries have been fully loaded) - ld.so loads the main executable and any dependent shared libraries and wants to update the "_r_debug" structure, but it now finds "_r_debug" in the a.out program and updates the state in this other copy - lldb hits the notification breakpoint and checks the ld.so copy of "_r_debug" which still has a state of "eAdd". LLDB wants the new "eConsistent" state which will trigger the shared libraries to load, but it gets stale data and doesn't do anyhing and library load is missed. The "_r_debug" in a.out has the state set correctly, but we don't know which "_r_debug" is the right one. The new fix detects the two "eAdd" states and loads shared libraries and will emit a log message in the "log enable lldb dyld" log channel which states there might be multiple "_r_debug" structs. The correct solution is that no one should be adding a duplicate "_r_debug" symbol to their binaries, but we have programs that are doing this already and since it can be done, we should be able to work with this and keep debug sessions working as expected. If a user #includes the <link.h> file, they can just use the existing "_r_debug" structure as it is defined in this header file as "extern struct r_debug _r_debug;" and no local copies need to be made. If your ld.so has debug info, you can easily see the duplicate "_r_debug" structs by doing: ``` (lldb) target variable _r_debug --raw (r_debug) _r_debug = { r_version = 1 r_map = 0x00007ffff7e30210 r_brk = 140737349972416 r_state = RT_CONSISTENT r_ldbase = 0 } (r_debug) _r_debug = { r_version = 1 r_map = 0x00007ffff7e30210 r_brk = 140737349972416 r_state = RT_ADD r_ldbase = 140737349943296 } (lldb) target variable &_r_debug (r_debug *) &_r_debug = 0x0000555555601040 (r_debug *) &_r_debug = 0x00007ffff7e301e0 ``` And if you do a "image lookup --address <addr>" in the addresses, you can see one is in the a.out and one in the ld.so. Adding more logging to print out the m_previous and m_current Rendezvous structures to make things more clear. Also added a log when we detect multiple eAdd states in a row to detect this problem in logs. Differential Revision: https://reviews.llvm.org/D158583
1 parent 6163d66 commit 07c215e

File tree

7 files changed

+282
-4
lines changed

7 files changed

+282
-4
lines changed

lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.cpp

Lines changed: 112 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,32 @@
2525
using namespace lldb;
2626
using namespace lldb_private;
2727

28+
const char *DYLDRendezvous::StateToCStr(RendezvousState state) {
29+
switch (state) {
30+
case DYLDRendezvous::eConsistent:
31+
return "eConsistent";
32+
case DYLDRendezvous::eAdd:
33+
return "eAdd";
34+
case DYLDRendezvous::eDelete:
35+
return "eDelete";
36+
}
37+
return "<invalid RendezvousState>";
38+
}
39+
40+
const char *DYLDRendezvous::ActionToCStr(RendezvousAction action) {
41+
switch (action) {
42+
case DYLDRendezvous::RendezvousAction::eTakeSnapshot:
43+
return "eTakeSnapshot";
44+
case DYLDRendezvous::RendezvousAction::eAddModules:
45+
return "eAddModules";
46+
case DYLDRendezvous::RendezvousAction::eRemoveModules:
47+
return "eRemoveModules";
48+
case DYLDRendezvous::RendezvousAction::eNoAction:
49+
return "eNoAction";
50+
}
51+
return "<invalid RendezvousAction>";
52+
}
53+
2854
DYLDRendezvous::DYLDRendezvous(Process *process)
2955
: m_process(process), m_rendezvous_addr(LLDB_INVALID_ADDRESS),
3056
m_executable_interpreter(false), m_current(), m_previous(),
@@ -129,6 +155,13 @@ void DYLDRendezvous::UpdateExecutablePath() {
129155
}
130156
}
131157

158+
void DYLDRendezvous::Rendezvous::DumpToLog(Log *log, const char *label) {
159+
LLDB_LOGF(log, "%s Rendezvous: version = %" PRIu64 ", map_addr = 0x%16.16"
160+
PRIx64 ", brk = 0x%16.16" PRIx64 ", state = %" PRIu64
161+
" (%s), ldbase = 0x%16.16" PRIx64, label ? label : "", version,
162+
map_addr, brk, state, StateToCStr((RendezvousState)state), ldbase);
163+
}
164+
132165
bool DYLDRendezvous::Resolve() {
133166
Log *log = GetLog(LLDBLog::DynamicLoader);
134167

@@ -176,6 +209,9 @@ bool DYLDRendezvous::Resolve() {
176209
m_previous = m_current;
177210
m_current = info;
178211

212+
m_previous.DumpToLog(log, "m_previous");
213+
m_current.DumpToLog(log, "m_current ");
214+
179215
if (m_current.map_addr == 0)
180216
return false;
181217

@@ -217,6 +253,75 @@ DYLDRendezvous::RendezvousAction DYLDRendezvous::GetAction() const {
217253
break;
218254

219255
case eAdd:
256+
// If the main executable or a shared library defines a publicly visible
257+
// symbol named "_r_debug", then it will cause problems once the executable
258+
// that contains the symbol is loaded into the process. The correct
259+
// "_r_debug" structure is currently found by LLDB by looking through
260+
// the .dynamic section in the main executable and finding the DT_DEBUG tag
261+
// entry.
262+
//
263+
// An issue comes up if someone defines another publicly visible "_r_debug"
264+
// struct in their program. Sample code looks like:
265+
//
266+
// #include <link.h>
267+
// r_debug _r_debug;
268+
//
269+
// If code like this is in an executable or shared library, this creates a
270+
// new "_r_debug" structure and it causes problems once the executable is
271+
// loaded due to the way symbol lookups happen in linux: the shared library
272+
// list from _r_debug.r_map will be searched for a symbol named "_r_debug"
273+
// and the first match will be the new version that is used. The dynamic
274+
// loader is always last in this list. So at some point the dynamic loader
275+
// will start updating the copy of "_r_debug" that gets found first. The
276+
// issue is that LLDB will only look at the copy that is pointed to by the
277+
// DT_DEBUG entry, or the initial version from the ld.so binary.
278+
//
279+
// Steps that show the problem are:
280+
//
281+
// - LLDB finds the "_r_debug" structure via the DT_DEBUG entry in the
282+
// .dynamic section and this points to the "_r_debug" in ld.so
283+
// - ld.so uodates its copy of "_r_debug" with "state = eAdd" before it
284+
// loads the dependent shared libraries for the main executable and
285+
// any dependencies of all shared libraries from the executable's list
286+
// and ld.so code calls the debugger notification function
287+
// that LLDB has set a breakpoint on.
288+
// - LLDB hits the breakpoint and the breakpoint has a callback function
289+
// where we read the _r_debug.state (eAdd) state and we do nothing as the
290+
// "eAdd" state indicates that the shared libraries are about to be added.
291+
// - ld.so finishes loading the main executable and any dependent shared
292+
// libraries and it will update the "_r_debug.state" member with a
293+
// "eConsistent", but it now updates the "_r_debug" in the a.out program
294+
// and it calls the debugger notification function.
295+
// - lldb hits the notification breakpoint and checks the ld.so copy of
296+
// "_r_debug.state" which still has a state of "eAdd", but LLDB needs to see a
297+
// "eConsistent" state to trigger the shared libraries to get loaded into
298+
// the debug session, but LLDB the ld.so _r_debug.state which still
299+
// contains "eAdd" and doesn't do anyhing and library load is missed.
300+
// The "_r_debug" in a.out has the state set correctly to "eConsistent"
301+
// but LLDB is still looking at the "_r_debug" from ld.so.
302+
//
303+
// So if we detect two "eAdd" states in a row, we assume this is the issue
304+
// and we now load shared libraries correctly and will emit a log message
305+
// in the "log enable lldb dyld" log channel which states there might be
306+
// multiple "_r_debug" structs causing problems.
307+
//
308+
// The correct solution is that no one should be adding a duplicate
309+
// publicly visible "_r_debug" symbols to their binaries, but we have
310+
// programs that are doing this already and since it can be done, we should
311+
// be able to work with this and keep debug sessions working as expected.
312+
//
313+
// If a user includes the <link.h> file, they can just use the existing
314+
// "_r_debug" structure as it is defined in this header file as "extern
315+
// struct r_debug _r_debug;" and no local copies need to be made.
316+
if (m_previous.state == eAdd) {
317+
Log *log = GetLog(LLDBLog::DynamicLoader);
318+
LLDB_LOG(log, "DYLDRendezvous::GetAction() found two eAdd states in a "
319+
"row, check process for multiple \"_r_debug\" symbols. "
320+
"Returning eAddModules to ensure shared libraries get loaded "
321+
"correctly");
322+
return eAddModules;
323+
}
324+
return eNoAction;
220325
case eDelete:
221326
return eNoAction;
222327
}
@@ -225,7 +330,9 @@ DYLDRendezvous::RendezvousAction DYLDRendezvous::GetAction() const {
225330
}
226331

227332
bool DYLDRendezvous::UpdateSOEntriesFromRemote() {
228-
auto action = GetAction();
333+
const auto action = GetAction();
334+
Log *log = GetLog(LLDBLog::DynamicLoader);
335+
LLDB_LOG(log, "{0} action = {1}", __PRETTY_FUNCTION__, ActionToCStr(action));
229336

230337
if (action == eNoAction)
231338
return false;
@@ -263,7 +370,10 @@ bool DYLDRendezvous::UpdateSOEntriesFromRemote() {
263370
bool DYLDRendezvous::UpdateSOEntries() {
264371
m_added_soentries.clear();
265372
m_removed_soentries.clear();
266-
switch (GetAction()) {
373+
const auto action = GetAction();
374+
Log *log = GetLog(LLDBLog::DynamicLoader);
375+
LLDB_LOG(log, "{0} action = {1}", __PRETTY_FUNCTION__, ActionToCStr(action));
376+
switch (action) {
267377
case eTakeSnapshot:
268378
m_soentries.clear();
269379
return TakeSnapshot(m_soentries);

lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DYLDRendezvous.h

Lines changed: 87 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,89 @@
2121
using lldb_private::LoadedModuleInfoList;
2222

2323
namespace lldb_private {
24+
class Log;
2425
class Process;
2526
}
2627

2728
/// \class DYLDRendezvous
2829
/// Interface to the runtime linker.
2930
///
3031
/// A structure is present in a processes memory space which is updated by the
31-
/// runtime liker each time a module is loaded or unloaded. This class
32+
/// dynamic linker each time a module is loaded or unloaded. This class
3233
/// provides an interface to this structure and maintains a consistent
3334
/// snapshot of the currently loaded modules.
35+
///
36+
/// In the dynamic loader sources, this structure has a type of "r_debug" and
37+
/// the name of the structure us "_r_debug". The structure looks like:
38+
///
39+
/// struct r_debug {
40+
/// // Version number for this protocol.
41+
/// int r_version;
42+
/// // Head of the chain of loaded objects.
43+
/// struct link_map *r_map;
44+
/// // The address the debugger should set a breakpoint at in order to get
45+
/// // notified when shared libraries are added or removed
46+
/// uintptr_t r_brk;
47+
/// // This state value describes the mapping change taking place when the
48+
/// // 'r_brk' address is called.
49+
/// enum {
50+
/// RT_CONSISTENT, // Mapping change is complete.
51+
/// RT_ADD, // Beginning to add a new object.
52+
/// RT_DELETE, // Beginning to remove an object mapping.
53+
/// } r_state;
54+
/// // Base address the linker is loaded at.
55+
/// uintptr_t r_ldbase;
56+
/// };
57+
///
58+
/// The dynamic linker then defines a global variable using this type named
59+
/// "_r_debug":
60+
///
61+
/// r_debug _r_debug;
62+
///
63+
/// The DYLDRendezvous class defines a local version of this structure named
64+
/// DYLDRendezvous::Rendezvous. See the definition inside the class definition
65+
/// for DYLDRendezvous.
66+
///
67+
/// This structure can be located by looking through the .dynamic section in
68+
/// the main executable and finding the DT_DEBUG tag entry. This value starts
69+
/// out with a value of zero when the program first is initially loaded, but
70+
/// the address of the "_r_debug" structure from ld.so is filled in by the
71+
/// dynamic loader during program initialization code in ld.so prior to loading
72+
/// or unloading and shared libraries.
73+
///
74+
/// The dynamic loader will update this structure as shared libraries are
75+
/// loaded and will call a specific function that LLDB knows to set a
76+
/// breakpoint on (from _r_debug.r_brk) so LLDB will find out when shared
77+
/// libraries are loaded or unloaded. Each time this breakpoint is hit, LLDB
78+
/// looks at the contents of this structure and the contents tell LLDB what
79+
/// needs to be done.
80+
///
81+
/// Currently we expect the "state" in this structure to change as things
82+
/// happen.
83+
///
84+
/// When any shared libraries are loaded the following happens:
85+
/// - _r_debug.r_map is updated with the new shared libraries. This is a
86+
/// doubly linked list of "link_map *" entries.
87+
/// - _r_debug.r_state is set to RT_ADD and the debugger notification
88+
/// function is called notifying the debugger that shared libraries are
89+
/// about to be added, but are not yet ready for use.
90+
/// - Once the the shared libraries are fully loaded, _r_debug.r_state is set
91+
/// to RT_CONSISTENT and the debugger notification function is called again
92+
/// notifying the debugger that shared libraries are ready for use.
93+
/// DYLDRendezvous must remember that the previous state was RT_ADD when it
94+
/// receives a RT_CONSISTENT in order to know to add libraries
95+
///
96+
/// When any shared libraries are unloaded the following happens:
97+
/// - _r_debug.r_map is updated and the unloaded libraries are removed.
98+
/// - _r_debug.r_state is set to RT_DELETE and the debugger notification
99+
/// function is called notifying the debugger that shared libraries are
100+
/// about to be removed.
101+
/// - Once the the shared libraries are removed _r_debug.r_state is set to
102+
/// RT_CONSISTENT and the debugger notification function is called again
103+
/// notifying the debugger that shared libraries have been removed.
104+
/// DYLDRendezvous must remember that the previous state was RT_DELETE when
105+
/// it receives a RT_CONSISTENT in order to know to remove libraries
106+
///
34107
class DYLDRendezvous {
35108

36109
// This structure is used to hold the contents of the debug rendezvous
@@ -45,6 +118,8 @@ class DYLDRendezvous {
45118
lldb::addr_t ldbase = 0;
46119

47120
Rendezvous() = default;
121+
122+
void DumpToLog(lldb_private::Log *log, const char *label);
48123
};
49124

50125
/// Locates the address of the rendezvous structure. It updates
@@ -126,8 +201,15 @@ class DYLDRendezvous {
126201

127202
/// Constants describing the state of the rendezvous.
128203
///
204+
/// These values are defined to match the r_debug.r_state enum from the
205+
/// actual dynamic loader sources.
206+
///
129207
/// \see GetState().
130-
enum RendezvousState { eConsistent, eAdd, eDelete };
208+
enum RendezvousState {
209+
eConsistent, // RT_CONSISTENT
210+
eAdd, // RT_ADD
211+
eDelete // RT_DELETE
212+
};
131213

132214
/// Structure representing the shared objects currently loaded into the
133215
/// inferior process.
@@ -276,6 +358,9 @@ class DYLDRendezvous {
276358
eRemoveModules
277359
};
278360

361+
static const char *StateToCStr(RendezvousState state);
362+
static const char *ActionToCStr(RendezvousAction action);
363+
279364
/// Returns the current action to be taken given the current and previous
280365
/// state
281366
RendezvousAction GetAction() const;
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
CXX_SOURCES := main.cpp
2+
DYLIB_NAME := testlib
3+
DYLIB_CXX_SOURCES := library_file.cpp
4+
include Makefile.rules
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""
2+
Test that LLDB can launch a linux executable through the dynamic loader where
3+
the main executable has an extra exported "_r_debug" symbol that used to mess
4+
up shared library loading with DYLDRendezvous and the POSIX dynamic loader
5+
plug-in. What used to happen is that any shared libraries other than the main
6+
executable and the dynamic loader and VSDO would not get loaded. This test
7+
checks to make sure that we still load libraries correctly when we have
8+
multiple "_r_debug" symbols. See comments in the main.cpp source file for full
9+
details on what the problem is.
10+
"""
11+
12+
import lldb
13+
import os
14+
15+
from lldbsuite.test import lldbutil
16+
from lldbsuite.test.decorators import *
17+
from lldbsuite.test.lldbtest import *
18+
19+
20+
class TestDyldWithMultipleRDebug(TestBase):
21+
@skipIf(oslist=no_match(["linux"]))
22+
@no_debug_info_test
23+
def test(self):
24+
self.build()
25+
# Run to a breakpoint in main.cpp to ensure we can hit breakpoints
26+
# in the main executable. Setting breakpoints by file and line ensures
27+
# that the main executable was loaded correctly by the dynamic loader
28+
(target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
29+
self, "// Break here", lldb.SBFileSpec("main.cpp"),
30+
extra_images=["testlib"]
31+
)
32+
# Set breakpoints both on shared library function to ensure that
33+
# we hit a source breakpoint in the shared library which only will
34+
# happen if we load the shared library correctly in the dynamic
35+
# loader.
36+
lldbutil.continue_to_source_breakpoint(
37+
self, process, "// Library break here",
38+
lldb.SBFileSpec("library_file.cpp", False)
39+
)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#include "library_file.h"
2+
#include <stdio.h>
3+
4+
int library_function(void) {
5+
puts(__FUNCTION__); // Library break here
6+
return 0;
7+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
int library_function();
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#include "library_file.h"
2+
#include <link.h>
3+
#include <stdio.h>
4+
// Make a duplicate "_r_debug" symbol that is visible. This is the global
5+
// variable name that the dynamic loader uses to communicate changes in shared
6+
// libraries that get loaded and unloaded. LLDB finds the address of this
7+
// variable by reading the DT_DEBUG entry from the .dynamic section of the main
8+
// executable.
9+
// What will happen is the dynamic loader will use the "_r_debug" symbol from
10+
// itself until the a.out executable gets loaded. At this point the new
11+
// "_r_debug" symbol will take precedence over the orignal "_r_debug" symbol
12+
// from the dynamic loader and the copy below will get updated with shared
13+
// library state changes while the version that LLDB checks in the dynamic
14+
// loader stays the same for ever after this.
15+
//
16+
// When our DYLDRendezvous.cpp tries to check the state in the _r_debug
17+
// structure, it will continue to get the last eAdd as the state before the
18+
// switch in symbol resolution.
19+
//
20+
// Before a fix in LLDB, this would mean that we wouldn't ever load any shared
21+
// libraries since DYLDRendezvous was waiting to see a eAdd state followed by a
22+
// eConsistent state which would trigger the adding of shared libraries, but we
23+
// would never see this change because the local copy below is actually what
24+
// would get updated. Now if DYLDRendezvous detects two eAdd states in a row,
25+
// it will load the shared libraries instead of doing nothing and a log message
26+
// will be printed out if "log enable lldb dyld" is active.
27+
r_debug _r_debug;
28+
29+
int main() {
30+
library_function(); // Break here
31+
return 0;
32+
}

0 commit comments

Comments
 (0)