Skip to content

Commit 00839ee

Browse files
hansendcIngo Molnar
authored andcommitted
x86/mm: Move swap offset/type up in PTE to work around erratum
This erratum can result in Accessed/Dirty getting set by the hardware when we do not expect them to be (on !Present PTEs). Instead of trying to fix them up after this happens, we just allow the bits to get set and try to ignore them. We do this by shifting the layout of the bits we use for swap offset/type in our 64-bit PTEs. It looks like this: bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0| after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| Note that D was already a don't care (X) even before. We just move TYPE up and turn its old spot (which could be hit by the A bit) into all don't cares. We take 5 bits away from the offset, but that still leaves us with 50 bits which lets us index into a 62-bit swapfile (4 EiB). I think that's probably fine for the moment. We could theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it doesn't gain us anything. Signed-off-by: Dave Hansen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toshi Kani <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
1 parent f80fd3a commit 00839ee

File tree

1 file changed

+20
-6
lines changed

1 file changed

+20
-6
lines changed

arch/x86/include/asm/pgtable_64.h

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -140,18 +140,32 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
140140
#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
141141
#define pte_unmap(pte) ((void)(pte))/* NOP */
142142

143-
/* Encode and de-code a swap entry */
143+
/*
144+
* Encode and de-code a swap entry
145+
*
146+
* | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number
147+
* | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names
148+
* | OFFSET (14->63) | TYPE (10-13) |0|X|X|X| X| X|X|X|0| <- swp entry
149+
*
150+
* G (8) is aliased and used as a PROT_NONE indicator for
151+
* !present ptes. We need to start storing swap entries above
152+
* there. We also need to avoid using A and D because of an
153+
* erratum where they can be incorrectly set by hardware on
154+
* non-present PTEs.
155+
*/
156+
#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
144157
#define SWP_TYPE_BITS 5
145-
#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
158+
/* Place the offset above the type: */
159+
#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS + 1)
146160

147161
#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
148162

149-
#define __swp_type(x) (((x).val >> (_PAGE_BIT_PRESENT + 1)) \
163+
#define __swp_type(x) (((x).val >> (SWP_TYPE_FIRST_BIT)) \
150164
& ((1U << SWP_TYPE_BITS) - 1))
151-
#define __swp_offset(x) ((x).val >> SWP_OFFSET_SHIFT)
165+
#define __swp_offset(x) ((x).val >> SWP_OFFSET_FIRST_BIT)
152166
#define __swp_entry(type, offset) ((swp_entry_t) { \
153-
((type) << (_PAGE_BIT_PRESENT + 1)) \
154-
| ((offset) << SWP_OFFSET_SHIFT) })
167+
((type) << (SWP_TYPE_FIRST_BIT)) \
168+
| ((offset) << SWP_OFFSET_FIRST_BIT) })
155169
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
156170
#define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val })
157171

0 commit comments

Comments
 (0)