Skip to content

Commit e219bf5

Browse files
committed
Bug#35728261 Autotest testNodeRestart -n WatchdogSlowShutdown T1
fails occasionally Context: Data nodes has a mechanism (implemented in ErrorReporter.cpp :: prepare_to_crash) to prevent multiple threads to start processing the crash handling in parallel. That works for ndbmtd, where we can have multiple threads crashing at the same time, but for ndbd case the prepare_to_crash function is empty since there is only one exec thread. But, in addition to exec threads, the watchdog thread can also start the crash handling, if this thread 'crash' at same time as the exec thread we can have both threads trying to start the crash handling in parallel. Problem: TestNodeRestart -n WatchdogSlowShutdown fail in .2ndbd due to a race between the Watchdog thread and the signal exec thread trying to start a crash handling in parallel. Solution: Singlethreaded version of prepare_to_crash changed in order to ensure that, if both Watchdog and exec thread crash 'at same time', only one thread (the first one) proceed with the crash handling, the second one is stopped immediately. This patch also fixes an issue in the FastScheduler implementation of traceDumpGetJam() used by the watchdog thread when it is the first thread to start the crash handling. Function uses a local EmulatedJamBuffer that is not set in the watchdog thread. Change-Id: I5c9ffb6d06bf74cd38b65268171ed9c9718265e4
1 parent 3dd968e commit e219bf5

File tree

5 files changed

+26
-6
lines changed

5 files changed

+26
-6
lines changed

storage/ndb/src/kernel/error/ErrorReporter.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,12 @@ int WriteMessage(int thrdMessageID, const char *thrdProblemData,
293293
* crash handler then we will never return from this first call.
294294
* Otherwise we will return, write the error log and never return
295295
* from the second call to prepare_to_crash below.
296+
*
297+
* In singlethreaded case first call of prepare_to_crash does nothing.
298+
* In the second call we ensure that, if there are two threads (watchdog
299+
* and signal execution) processing the crash handling in parallel,
300+
* only one thread (the first one) will proceed with the crash
301+
* handling, the second one will stop immediately.
296302
*/
297303
ErrorReporter::prepare_to_crash(true, (nst == NST_ErrorInsert));
298304

storage/ndb/src/kernel/vm/Emulator.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -722,6 +722,7 @@ EmulatorData::EmulatorData() {
722722
theShutdownMutex = 0;
723723
m_socket_server = 0;
724724
m_mem_manager = 0;
725+
m_st_jam_buffer = 0;
725726
}
726727

727728
void EmulatorData::create() {
@@ -735,6 +736,7 @@ void EmulatorData::create() {
735736
EmulatedJamBuffer *jamBuffer = nullptr;
736737
#endif
737738
NDB_THREAD_TLS_JAM = jamBuffer;
739+
m_st_jam_buffer = jamBuffer;
738740

739741
theConfiguration = new Configuration();
740742
theWatchDog = new WatchDog();
@@ -769,4 +771,5 @@ void EmulatorData::destroy() {
769771
NdbMutex_Destroy(theShutdownMutex);
770772
if (m_mem_manager) delete m_mem_manager;
771773
m_mem_manager = 0;
774+
m_st_jam_buffer = 0;
772775
}

storage/ndb/src/kernel/vm/Emulator.hpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,9 @@ struct EmulatorData {
134134
class SocketServer *m_socket_server;
135135
class Ndbd_mem_manager *m_mem_manager;
136136

137+
// Single threaded only
138+
EmulatedJamBuffer *m_st_jam_buffer;
139+
137140
/**
138141
* Constructor
139142
*

storage/ndb/src/kernel/vm/FastScheduler.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -455,9 +455,9 @@ bool FastScheduler::traceDumpGetJam(Uint32 thr_no,
455455
thrdTheEmulatedJam = NULL;
456456
thrdTheEmulatedJamIndex = 0;
457457
#else
458-
const EmulatedJamBuffer *jamBuffer = NDB_THREAD_TLS_JAM;
459-
thrdTheEmulatedJam = jamBuffer->theEmulatedJam;
460-
thrdTheEmulatedJamIndex = jamBuffer->theEmulatedJamIndex;
458+
thrdTheEmulatedJam = globalEmulatorData.m_st_jam_buffer->theEmulatedJam;
459+
thrdTheEmulatedJamIndex =
460+
globalEmulatorData.m_st_jam_buffer->theEmulatedJamIndex;
461461
#endif
462462
return true;
463463
}

storage/ndb/src/kernel/vm/SimulatedBlock.cpp

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4950,14 +4950,22 @@ Uint32 SimulatedBlock::get_recv_thread_idx(TrpId trp_id) {
49504950

49514951
#ifndef NDBD_MULTITHREADED
49524952
/**
4953-
* Add a stub for this function since we have some code in ErrorReporter.cpp
4954-
* that needs this function, it's only really needed for ndbmtd, so need an
4955-
* empty function in ndbd.
4953+
* Function for ndbd only. ndbmtd version of this function
4954+
* is implemented in ErrorReporter.cpp
49564955
*/
49574956
void ErrorReporter::prepare_to_crash(bool first_phase,
49584957
bool error_insert_crash) {
49594958
(void)first_phase;
49604959
(void)error_insert_crash;
4960+
4961+
static bool crash_handling_started = false;
4962+
if (!first_phase) {
4963+
if (crash_handling_started) {
4964+
/* Someone else handling the crash, exit now */
4965+
my_thread_exit(nullptr);
4966+
}
4967+
crash_handling_started = true;
4968+
}
49614969
}
49624970
#endif
49634971

0 commit comments

Comments
 (0)