Skip to content

Commit 85dc600

Browse files
Peter ZijlstraIngo Molnar
authored andcommitted
perf/x86/amd/ibs: Fix pmu::stop() nesting
Patch 5a50f52 ("perf/x86/ibs: Fix race with IBS_STARTING state") closed a big hole while opening another, smaller hole. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Fixes: 5a50f52 ("perf/x86/ibs: Fix race with IBS_STARTING state") Signed-off-by: Ingo Molnar <[email protected]>
1 parent 201c2f8 commit 85dc600

File tree

1 file changed

+45
-7
lines changed

1 file changed

+45
-7
lines changed

arch/x86/events/amd/ibs.c

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,46 @@ static u32 ibs_caps;
2828
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
2929
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
3030

31+
32+
/*
33+
* IBS states:
34+
*
35+
* ENABLED; tracks the pmu::add(), pmu::del() state, when set the counter is taken
36+
* and any further add()s must fail.
37+
*
38+
* STARTED/STOPPING/STOPPED; deal with pmu::start(), pmu::stop() state but are
39+
* complicated by the fact that the IBS hardware can send late NMIs (ie. after
40+
* we've cleared the EN bit).
41+
*
42+
* In order to consume these late NMIs we have the STOPPED state, any NMI that
43+
* happens after we've cleared the EN state will clear this bit and report the
44+
* NMI handled (this is fundamentally racy in the face or multiple NMI sources,
45+
* someone else can consume our BIT and our NMI will go unhandled).
46+
*
47+
* And since we cannot set/clear this separate bit together with the EN bit,
48+
* there are races; if we cleared STARTED early, an NMI could land in
49+
* between clearing STARTED and clearing the EN bit (in fact multiple NMIs
50+
* could happen if the period is small enough), and consume our STOPPED bit
51+
* and trigger streams of unhandled NMIs.
52+
*
53+
* If, however, we clear STARTED late, an NMI can hit between clearing the
54+
* EN bit and clearing STARTED, still see STARTED set and process the event.
55+
* If this event will have the VALID bit clear, we bail properly, but this
56+
* is not a given. With VALID set we can end up calling pmu::stop() again
57+
* (the throttle logic) and trigger the WARNs in there.
58+
*
59+
* So what we do is set STOPPING before clearing EN to avoid the pmu::stop()
60+
* nesting, and clear STARTED late, so that we have a well defined state over
61+
* the clearing of the EN bit.
62+
*
63+
* XXX: we could probably be using !atomic bitops for all this.
64+
*/
65+
3166
enum ibs_states {
3267
IBS_ENABLED = 0,
3368
IBS_STARTED = 1,
3469
IBS_STOPPING = 2,
70+
IBS_STOPPED = 3,
3571

3672
IBS_MAX_STATES,
3773
};
@@ -377,11 +413,10 @@ static void perf_ibs_start(struct perf_event *event, int flags)
377413

378414
perf_ibs_set_period(perf_ibs, hwc, &period);
379415
/*
380-
* Set STARTED before enabling the hardware, such that
381-
* a subsequent NMI must observe it. Then clear STOPPING
382-
* such that we don't consume NMIs by accident.
416+
* Set STARTED before enabling the hardware, such that a subsequent NMI
417+
* must observe it.
383418
*/
384-
set_bit(IBS_STARTED, pcpu->state);
419+
set_bit(IBS_STARTED, pcpu->state);
385420
clear_bit(IBS_STOPPING, pcpu->state);
386421
perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
387422

@@ -396,6 +431,9 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
396431
u64 config;
397432
int stopping;
398433

434+
if (test_and_set_bit(IBS_STOPPING, pcpu->state))
435+
return;
436+
399437
stopping = test_bit(IBS_STARTED, pcpu->state);
400438

401439
if (!stopping && (hwc->state & PERF_HES_UPTODATE))
@@ -405,12 +443,12 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
405443

406444
if (stopping) {
407445
/*
408-
* Set STOPPING before disabling the hardware, such that it
446+
* Set STOPPED before disabling the hardware, such that it
409447
* must be visible to NMIs the moment we clear the EN bit,
410448
* at which point we can generate an !VALID sample which
411449
* we need to consume.
412450
*/
413-
set_bit(IBS_STOPPING, pcpu->state);
451+
set_bit(IBS_STOPPED, pcpu->state);
414452
perf_ibs_disable_event(perf_ibs, hwc, config);
415453
/*
416454
* Clear STARTED after disabling the hardware; if it were
@@ -556,7 +594,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
556594
* with samples that even have the valid bit cleared.
557595
* Mark all this NMIs as handled.
558596
*/
559-
if (test_and_clear_bit(IBS_STOPPING, pcpu->state))
597+
if (test_and_clear_bit(IBS_STOPPED, pcpu->state))
560598
return 1;
561599

562600
return 0;

0 commit comments

Comments
 (0)