Skip to content

Commit f1cb8f9

Browse files
npigginmpe
authored andcommitted
powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags
The ISA suggests ptesync after setting a pte, to prevent a table walk initiated by a subsequent access from missing that store and causing a spurious fault. This is an architectual allowance that allows an implementation's page table walker to be incoherent with the store queue. However there is no correctness problem in taking a spurious fault in userspace -- the kernel copes with these at any time, so the updated pte will be found eventually. Spurious kernel faults on vmap memory must be avoided, so a ptesync is put into flush_cache_vmap. On POWER9 so far I have not found a measurable window where this can result in more minor faults, so as an optimisation, remove the costly ptesync from pte updates. If an implementation benefits from ptesync, it would be better to add it back in update_mmu_cache, so it's not done for things like fork(2). fork --fork --exec benchmark improved 5.2% (12400->13100). Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
1 parent 68662f8 commit f1cb8f9

File tree

3 files changed

+32
-2
lines changed

3 files changed

+32
-2
lines changed

arch/powerpc/include/asm/book3s/64/radix.h

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,24 @@ static inline void radix__set_pte_at(struct mm_struct *mm, unsigned long addr,
202202
pte_t *ptep, pte_t pte, int percpu)
203203
{
204204
*ptep = pte;
205-
asm volatile("ptesync" : : : "memory");
205+
206+
/*
207+
* The architecture suggests a ptesync after setting the pte, which
208+
* orders the store that updates the pte with subsequent page table
209+
* walk accesses which may load the pte. Without this it may be
210+
* possible for a subsequent access to result in spurious fault.
211+
*
212+
* This is not necessary for correctness, because a spurious fault
213+
* is tolerated by the page fault handler, and this store will
214+
* eventually be seen. In testing, there was no noticable increase
215+
* in user faults on POWER9. Avoiding ptesync here is a significant
216+
* win for things like fork. If a future microarchitecture benefits
217+
* from ptesync, it should probably go into update_mmu_cache, rather
218+
* than set_pte_at (which is used to set ptes unrelated to faults).
219+
*
220+
* Spurious faults to vmalloc region are not tolerated, so there is
221+
* a ptesync in flush_cache_vmap.
222+
*/
206223
}
207224

208225
static inline int radix__pmd_bad(pmd_t pmd)

arch/powerpc/include/asm/cacheflush.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,19 @@
2626
#define flush_cache_vmap(start, end) do { } while (0)
2727
#define flush_cache_vunmap(start, end) do { } while (0)
2828

29+
#ifdef CONFIG_BOOK3S_64
30+
/*
31+
* Book3s has no ptesync after setting a pte, so without this ptesync it's
32+
* possible for a kernel virtual mapping access to return a spurious fault
33+
* if it's accessed right after the pte is set. The page fault handler does
34+
* not expect this type of fault. flush_cache_vmap is not exactly the right
35+
* place to put this, but it seems to work well enough.
36+
*/
37+
#define flush_cache_vmap(start, end) do { asm volatile("ptesync"); } while (0)
38+
#else
39+
#define flush_cache_vmap(start, end) do { } while (0)
40+
#endif
41+
2942
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
3043
extern void flush_dcache_page(struct page *page);
3144
#define flush_dcache_mmap_lock(mapping) do { } while (0)

arch/powerpc/mm/pgtable-radix.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1115,5 +1115,5 @@ void radix__ptep_set_access_flags(struct vm_area_struct *vma, pte_t *ptep,
11151115
* an access fault, which is defined by the architectue.
11161116
*/
11171117
}
1118-
asm volatile("ptesync" : : : "memory");
1118+
/* See ptesync comment in radix__set_pte_at */
11191119
}

0 commit comments

Comments
 (0)