Skip to content

Commit 9e52fc2

Browse files
vittyvkIngo Molnar
authored andcommitted
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
There's a subtle bug in how some of the paravirt guest code handles page table freeing on x86: On x86 software page table walkers depend on the fact that remote TLB flush does an IPI: walk is performed lockless but with interrupts disabled and in case the page table is freed the freeing CPU will get blocked as remote TLB flush is required. On other architectures which don't require an IPI to do remote TLB flush we have an RCU-based mechanism (see include/asm-generic/tlb.h for more details). In virtualized environments we may want to override the ->flush_tlb_others callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has been doing this for years and the upcoming remote TLB flush for Hyper-V will do it too. This is not safe, as software page table walkers may step on an already freed page. Fix the bug by enabling the RCU-based page table freeing mechanism, CONFIG_HAVE_RCU_TABLE_FREE=y. Testing with kernbench and mmap/munmap microbenchmarks, and neither showed any noticeable performance impact. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Vitaly Kuznetsov <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Juergen Gross <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Jork Loeser <[email protected]> Cc: KY Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Rewrote/fixed/clarified the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
1 parent 39e48d9 commit 9e52fc2

File tree

3 files changed

+19
-4
lines changed

3 files changed

+19
-4
lines changed

arch/x86/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ config X86
167167
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
168168
select HAVE_PERF_REGS
169169
select HAVE_PERF_USER_STACK_DUMP
170+
select HAVE_RCU_TABLE_FREE
170171
select HAVE_REGS_AND_STACK_ACCESS_API
171172
select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && STACK_VALIDATION
172173
select HAVE_STACK_VALIDATION if X86_64

arch/x86/include/asm/tlb.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,18 @@
1515

1616
#include <asm-generic/tlb.h>
1717

18+
/*
19+
* While x86 architecture in general requires an IPI to perform TLB
20+
* shootdown, enablement code for several hypervisors overrides
21+
* .flush_tlb_others hook in pv_mmu_ops and implements it by issuing
22+
* a hypercall. To keep software pagetable walkers safe in this case we
23+
* switch to RCU based table free (HAVE_RCU_TABLE_FREE). See the comment
24+
* below 'ifdef CONFIG_HAVE_RCU_TABLE_FREE' in include/asm-generic/tlb.h
25+
* for more details.
26+
*/
27+
static inline void __tlb_remove_table(void *table)
28+
{
29+
free_page_and_swap_cache(table);
30+
}
31+
1832
#endif /* _ASM_X86_TLB_H */

arch/x86/mm/pgtable.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
5656
{
5757
pgtable_page_dtor(pte);
5858
paravirt_release_pte(page_to_pfn(pte));
59-
tlb_remove_page(tlb, pte);
59+
tlb_remove_table(tlb, pte);
6060
}
6161

6262
#if CONFIG_PGTABLE_LEVELS > 2
@@ -72,21 +72,21 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
7272
tlb->need_flush_all = 1;
7373
#endif
7474
pgtable_pmd_page_dtor(page);
75-
tlb_remove_page(tlb, page);
75+
tlb_remove_table(tlb, page);
7676
}
7777

7878
#if CONFIG_PGTABLE_LEVELS > 3
7979
void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
8080
{
8181
paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
82-
tlb_remove_page(tlb, virt_to_page(pud));
82+
tlb_remove_table(tlb, virt_to_page(pud));
8383
}
8484

8585
#if CONFIG_PGTABLE_LEVELS > 4
8686
void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
8787
{
8888
paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
89-
tlb_remove_page(tlb, virt_to_page(p4d));
89+
tlb_remove_table(tlb, virt_to_page(p4d));
9090
}
9191
#endif /* CONFIG_PGTABLE_LEVELS > 4 */
9292
#endif /* CONFIG_PGTABLE_LEVELS > 3 */

0 commit comments

Comments
 (0)