Skip to content

Commit 5ceac44

Browse files
marcangregkh
authored andcommitted
xhci: Handle TD clearing for multiple streams case
When multiple streams are in use, multiple TDs might be in flight when an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for each, to ensure everything is reset properly and the caches cleared. Change the logic so that any N>1 TDs found active for different streams are deferred until after the first one is processed, calling xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to queue another command until we are done with all of them. Also change the error/"should never happen" paths to ensure we at least clear any affected TDs, even if we can't issue a command to clear the hardware cache, and complain loudly with an xhci_warn() if this ever happens. This problem case dates back to commit e9df17e ("USB: xhci: Correct assumptions about number of rings per endpoint.") early on in the XHCI driver's life, when stream support was first added. It was then identified but not fixed nor made into a warning in commit 674f843 ("xhci: split handling halted endpoints into two steps"), which added a FIXME comment for the problem case (without materially changing the behavior as far as I can tell, though the new logic made the problem more obvious). Then later, in commit 94f3391 ("xhci: Fix failure to give back some cached cancelled URBs."), it was acknowledged again. [Mathias: commit 94f3391 ("xhci: Fix failure to give back some cached cancelled URBs.") was a targeted regression fix to the previously mentioned patch. Users reported issues with usb stuck after unmounting/disconnecting UAS devices. This rolled back the TD clearing of multiple streams to its original state.] Apparently the commit author was aware of the problem (yet still chose to submit it): It was still mentioned as a FIXME, an xhci_dbg() was added to log the problem condition, and the remaining issue was mentioned in the commit description. The choice of making the log type xhci_dbg() for what is, at this point, a completely unhandled and known broken condition is puzzling and unfortunate, as it guarantees that no actual users would see the log in production, thereby making it nigh undebuggable (indeed, even if you turn on DEBUG, the message doesn't really hint at there being a problem at all). It took me *months* of random xHC crashes to finally find a reliable repro and be able to do a deep dive debug session, which could all have been avoided had this unhandled, broken condition been actually reported with a warning, as it should have been as a bug intentionally left in unfixed (never mind that it shouldn't have been left in at all). > Another fix to solve clearing the caches of all stream rings with > cancelled TDs is needed, but not as urgent. 3 years after that statement and 14 years after the original bug was introduced, I think it's finally time to fix it. And maybe next time let's not leave bugs unfixed (that are actually worse than the original bug), and let's actually get people to review kernel commits please. Fixes xHC crashes and IOMMU faults with UAS devices when handling errors/faults. Easiest repro is to use `hdparm` to mark an early sector (e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop. At least in the case of JMicron controllers, the read errors end up having to cancel two TDs (for two queued requests to different streams) and the one that didn't get cleared properly ends up faulting the xHC entirely when it tries to access DMA pages that have since been unmapped, referred to by the stale TDs. This normally happens quickly (after two or three loops). After this fix, I left the `cat` in a loop running overnight and experienced no xHC failures, with all read errors recovered properly. Repro'd and tested on an Apple M1 Mac Mini (dwc3 host). On systems without an IOMMU, this bug would instead silently corrupt freed memory, making this a security bug (even on systems with IOMMUs this could silently corrupt memory belonging to other USB devices on the same controller, so it's still a security bug). Given that the kernel autoprobes partition tables, I'm pretty sure a malicious USB device pretending to be a UAS device and reporting an error with the right timing could deliberately trigger a UAF and write to freed memory, with no user action. [Mathias: Commit message and code comment edit, original at:] https://lore.kernel.org/linux-usb/[email protected]/ Fixes: e9df17e ("USB: xhci: Correct assumptions about number of rings per endpoint.") Fixes: 94f3391 ("xhci: Fix failure to give back some cached cancelled URBs.") Fixes: 674f843 ("xhci: split handling halted endpoints into two steps") Cc: [email protected] Cc: [email protected] Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Hector Martin <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 91f7a15 commit 5ceac44

File tree

2 files changed

+44
-11
lines changed

2 files changed

+44
-11
lines changed

drivers/usb/host/xhci-ring.c

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1031,13 +1031,27 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
10311031
break;
10321032
case TD_DIRTY: /* TD is cached, clear it */
10331033
case TD_HALTED:
1034+
case TD_CLEARING_CACHE_DEFERRED:
1035+
if (cached_td) {
1036+
if (cached_td->urb->stream_id != td->urb->stream_id) {
1037+
/* Multiple streams case, defer move dq */
1038+
xhci_dbg(xhci,
1039+
"Move dq deferred: stream %u URB %p\n",
1040+
td->urb->stream_id, td->urb);
1041+
td->cancel_status = TD_CLEARING_CACHE_DEFERRED;
1042+
break;
1043+
}
1044+
1045+
/* Should never happen, but clear the TD if it does */
1046+
xhci_warn(xhci,
1047+
"Found multiple active URBs %p and %p in stream %u?\n",
1048+
td->urb, cached_td->urb,
1049+
td->urb->stream_id);
1050+
td_to_noop(xhci, ring, cached_td, false);
1051+
cached_td->cancel_status = TD_CLEARED;
1052+
}
1053+
10341054
td->cancel_status = TD_CLEARING_CACHE;
1035-
if (cached_td)
1036-
/* FIXME stream case, several stopped rings */
1037-
xhci_dbg(xhci,
1038-
"Move dq past stream %u URB %p instead of stream %u URB %p\n",
1039-
td->urb->stream_id, td->urb,
1040-
cached_td->urb->stream_id, cached_td->urb);
10411055
cached_td = td;
10421056
break;
10431057
}
@@ -1057,10 +1071,16 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
10571071
if (err) {
10581072
/* Failed to move past cached td, just set cached TDs to no-op */
10591073
list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
1060-
if (td->cancel_status != TD_CLEARING_CACHE)
1074+
/*
1075+
* Deferred TDs need to have the deq pointer set after the above command
1076+
* completes, so if that failed we just give up on all of them (and
1077+
* complain loudly since this could cause issues due to caching).
1078+
*/
1079+
if (td->cancel_status != TD_CLEARING_CACHE &&
1080+
td->cancel_status != TD_CLEARING_CACHE_DEFERRED)
10611081
continue;
1062-
xhci_dbg(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
1063-
td->urb);
1082+
xhci_warn(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
1083+
td->urb);
10641084
td_to_noop(xhci, ring, td, false);
10651085
td->cancel_status = TD_CLEARED;
10661086
}
@@ -1346,6 +1366,7 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
13461366
struct xhci_ep_ctx *ep_ctx;
13471367
struct xhci_slot_ctx *slot_ctx;
13481368
struct xhci_td *td, *tmp_td;
1369+
bool deferred = false;
13491370

13501371
ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
13511372
stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
@@ -1432,6 +1453,8 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
14321453
xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n",
14331454
__func__, td->urb);
14341455
xhci_td_cleanup(ep->xhci, td, ep_ring, td->status);
1456+
} else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) {
1457+
deferred = true;
14351458
} else {
14361459
xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n",
14371460
__func__, td->urb, td->cancel_status);
@@ -1441,8 +1464,17 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
14411464
ep->ep_state &= ~SET_DEQ_PENDING;
14421465
ep->queued_deq_seg = NULL;
14431466
ep->queued_deq_ptr = NULL;
1444-
/* Restart any rings with pending URBs */
1445-
ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
1467+
1468+
if (deferred) {
1469+
/* We have more streams to clear */
1470+
xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n",
1471+
__func__);
1472+
xhci_invalidate_cancelled_tds(ep);
1473+
} else {
1474+
/* Restart any rings with pending URBs */
1475+
xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__);
1476+
ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
1477+
}
14461478
}
14471479

14481480
static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id,

drivers/usb/host/xhci.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1276,6 +1276,7 @@ enum xhci_cancelled_td_status {
12761276
TD_DIRTY = 0,
12771277
TD_HALTED,
12781278
TD_CLEARING_CACHE,
1279+
TD_CLEARING_CACHE_DEFERRED,
12791280
TD_CLEARED,
12801281
};
12811282

0 commit comments

Comments
 (0)