Skip to content

Commit fe4cdc2

Browse files
xzpeterakpm00
authored andcommitted
mm/userfaultfd: fix release hang over concurrent GUP
This patch should fix a possible userfaultfd release() hang during concurrent GUP. This problem was initially reported by Dimitris Siakavaras in July 2023 [1] in a firecracker use case. Firecracker has a separate process handling page faults remotely, and when the process releases the userfaultfd it can race with a concurrent GUP from KVM trying to fault in a guest page during the secondary MMU page fault process. A similar problem was reported recently again by Jinjiang Tu in March 2025 [2], even though the race happened this time with a mlockall() operation, which does GUP in a similar fashion. In 2017, commit 656710a ("userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS") was trying to fix this issue. AFAIU, that fixes well the fault paths but may not work yet for GUP. In GUP, the issue is NOPAGE will be almost treated the same as "page fault resolved" in faultin_page(), then the GUP will follow page again, seeing page missing, and it'll keep going into a live lock situation as reported. This change makes core mm return RETRY instead of NOPAGE for both the GUP and fault paths, proactively releasing the mmap read lock. This should guarantee the other release thread make progress on taking the write lock and avoid the live lock even for GUP. When at it, rearrange the comments to make sure it's uptodate. [1] https://lore.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Jinjiang Tu <[email protected]> Cc: Dimitris Siakavaras <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 25601e8 commit fe4cdc2

File tree

1 file changed

+25
-26
lines changed

1 file changed

+25
-26
lines changed

fs/userfaultfd.c

Lines changed: 25 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -395,32 +395,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
395395
if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY))
396396
goto out;
397397

398-
/*
399-
* If it's already released don't get it. This avoids to loop
400-
* in __get_user_pages if userfaultfd_release waits on the
401-
* caller of handle_userfault to release the mmap_lock.
402-
*/
403-
if (unlikely(READ_ONCE(ctx->released))) {
404-
/*
405-
* Don't return VM_FAULT_SIGBUS in this case, so a non
406-
* cooperative manager can close the uffd after the
407-
* last UFFDIO_COPY, without risking to trigger an
408-
* involuntary SIGBUS if the process was starting the
409-
* userfaultfd while the userfaultfd was still armed
410-
* (but after the last UFFDIO_COPY). If the uffd
411-
* wasn't already closed when the userfault reached
412-
* this point, that would normally be solved by
413-
* userfaultfd_must_wait returning 'false'.
414-
*
415-
* If we were to return VM_FAULT_SIGBUS here, the non
416-
* cooperative manager would be instead forced to
417-
* always call UFFDIO_UNREGISTER before it can safely
418-
* close the uffd.
419-
*/
420-
ret = VM_FAULT_NOPAGE;
421-
goto out;
422-
}
423-
424398
/*
425399
* Check that we can return VM_FAULT_RETRY.
426400
*
@@ -457,6 +431,31 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
457431
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
458432
goto out;
459433

434+
if (unlikely(READ_ONCE(ctx->released))) {
435+
/*
436+
* If a concurrent release is detected, do not return
437+
* VM_FAULT_SIGBUS or VM_FAULT_NOPAGE, but instead always
438+
* return VM_FAULT_RETRY with lock released proactively.
439+
*
440+
* If we were to return VM_FAULT_SIGBUS here, the non
441+
* cooperative manager would be instead forced to
442+
* always call UFFDIO_UNREGISTER before it can safely
443+
* close the uffd, to avoid involuntary SIGBUS triggered.
444+
*
445+
* If we were to return VM_FAULT_NOPAGE, it would work for
446+
* the fault path, in which the lock will be released
447+
* later. However for GUP, faultin_page() does nothing
448+
* special on NOPAGE, so GUP would spin retrying without
449+
* releasing the mmap read lock, causing possible livelock.
450+
*
451+
* Here only VM_FAULT_RETRY would make sure the mmap lock
452+
* be released immediately, so that the thread concurrently
453+
* releasing the userfault would always make progress.
454+
*/
455+
release_fault_lock(vmf);
456+
goto out;
457+
}
458+
460459
/* take the reference before dropping the mmap_lock */
461460
userfaultfd_ctx_get(ctx);
462461

0 commit comments

Comments
 (0)