Skip to content

Commit a0e3a18

Browse files
committed
ring-buffer: Bring back context level recursive checks
Commit 1a149d7 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler") replaced the context level recursion checks with a simple counter. This would prevent the ring buffer code from recursively calling itself more than the max number of contexts that exist (Normal, softirq, irq, nmi). But this change caused a lockup in a specific case, which was during suspend and resume using a global clock. Adding a stack dump to see where this occurred, the issue was in the trace global clock itself: trace_buffer_lock_reserve+0x1c/0x50 __trace_graph_entry+0x2d/0x90 trace_graph_entry+0xe8/0x200 prepare_ftrace_return+0x69/0xc0 ftrace_graph_caller+0x78/0xa8 queued_spin_lock_slowpath+0x5/0x1d0 trace_clock_global+0xb0/0xc0 ring_buffer_lock_reserve+0xf9/0x390 The function graph tracer traced queued_spin_lock_slowpath that was called by trace_clock_global. This pointed out that the trace_clock_global() is not reentrant, as it takes a spin lock. It depended on the ring buffer recursive lock from letting that happen. By removing the context detection and adding just a max number of allowable recursions, it allowed the trace_clock_global() to be entered again and try to retake the spinlock it already held, causing a deadlock. Fixes: 1a149d7 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler") Reported-by: David Weinehall <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
1 parent 4397f04 commit a0e3a18

File tree

1 file changed

+45
-17
lines changed

1 file changed

+45
-17
lines changed

kernel/trace/ring_buffer.c

Lines changed: 45 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2534,39 +2534,67 @@ rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
25342534
* The lock and unlock are done within a preempt disable section.
25352535
* The current_context per_cpu variable can only be modified
25362536
* by the current task between lock and unlock. But it can
2537-
* be modified more than once via an interrupt. There are four
2538-
* different contexts that we need to consider.
2537+
* be modified more than once via an interrupt. To pass this
2538+
* information from the lock to the unlock without having to
2539+
* access the 'in_interrupt()' functions again (which do show
2540+
* a bit of overhead in something as critical as function tracing,
2541+
* we use a bitmask trick.
25392542
*
2540-
* Normal context.
2541-
* SoftIRQ context
2542-
* IRQ context
2543-
* NMI context
2543+
* bit 0 = NMI context
2544+
* bit 1 = IRQ context
2545+
* bit 2 = SoftIRQ context
2546+
* bit 3 = normal context.
25442547
*
2545-
* If for some reason the ring buffer starts to recurse, we
2546-
* only allow that to happen at most 4 times (one for each
2547-
* context). If it happens 5 times, then we consider this a
2548-
* recusive loop and do not let it go further.
2548+
* This works because this is the order of contexts that can
2549+
* preempt other contexts. A SoftIRQ never preempts an IRQ
2550+
* context.
2551+
*
2552+
* When the context is determined, the corresponding bit is
2553+
* checked and set (if it was set, then a recursion of that context
2554+
* happened).
2555+
*
2556+
* On unlock, we need to clear this bit. To do so, just subtract
2557+
* 1 from the current_context and AND it to itself.
2558+
*
2559+
* (binary)
2560+
* 101 - 1 = 100
2561+
* 101 & 100 = 100 (clearing bit zero)
2562+
*
2563+
* 1010 - 1 = 1001
2564+
* 1010 & 1001 = 1000 (clearing bit 1)
2565+
*
2566+
* The least significant bit can be cleared this way, and it
2567+
* just so happens that it is the same bit corresponding to
2568+
* the current context.
25492569
*/
25502570

25512571
static __always_inline int
25522572
trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
25532573
{
2554-
if (cpu_buffer->current_context >= 4)
2574+
unsigned int val = cpu_buffer->current_context;
2575+
unsigned long pc = preempt_count();
2576+
int bit;
2577+
2578+
if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
2579+
bit = RB_CTX_NORMAL;
2580+
else
2581+
bit = pc & NMI_MASK ? RB_CTX_NMI :
2582+
pc & HARDIRQ_MASK ? RB_CTX_IRQ :
2583+
pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ;
2584+
2585+
if (unlikely(val & (1 << bit)))
25552586
return 1;
25562587

2557-
cpu_buffer->current_context++;
2558-
/* Interrupts must see this update */
2559-
barrier();
2588+
val |= (1 << bit);
2589+
cpu_buffer->current_context = val;
25602590

25612591
return 0;
25622592
}
25632593

25642594
static __always_inline void
25652595
trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
25662596
{
2567-
/* Don't let the dec leak out */
2568-
barrier();
2569-
cpu_buffer->current_context--;
2597+
cpu_buffer->current_context &= cpu_buffer->current_context - 1;
25702598
}
25712599

25722600
/**

0 commit comments

Comments
 (0)