Skip to content

Commit 8e0861f

Browse files
aikozbenh
authored andcommitted
powerpc: Prepare to support kernel handling of IOMMU map/unmap
The current VFIO-on-POWER implementation supports only user mode driven mapping, i.e. QEMU is sending requests to map/unmap pages. However this approach is really slow, so we want to move that to KVM. Since H_PUT_TCE can be extremely performance sensitive (especially with network adapters where each packet needs to be mapped/unmapped) we chose to implement that as a "fast" hypercall directly in "real mode" (processor still in the guest context but MMU off). To be able to do that, we need to provide some facilities to access the struct page count within that real mode environment as things like the sparsemem vmemmap mappings aren't accessible. This adds an API function realmode_pfn_to_page() to get page struct when MMU is off. This adds to MM a new function put_page_unless_one() which drops a page if counter is bigger than 1. It is going to be used when MMU is off (for example, real mode on PPC64) and we want to make sure that page release will not happen in real mode as it may crash the kernel in a horrible way. CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported. Cc: [email protected] Cc: Benjamin Herrenschmidt <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Paul Mackerras <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
1 parent 81fcfb8 commit 8e0861f

File tree

4 files changed

+69
-2
lines changed

4 files changed

+69
-2
lines changed

arch/powerpc/include/asm/pgtable-ppc64.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,8 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
394394
hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
395395
}
396396

397+
struct page *realmode_pfn_to_page(unsigned long pfn);
398+
397399
static inline char *get_hpte_slot_array(pmd_t *pmdp)
398400
{
399401
/*

arch/powerpc/mm/init_64.c

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,5 +304,54 @@ void register_page_bootmem_memmap(unsigned long section_nr,
304304
struct page *start_page, unsigned long size)
305305
{
306306
}
307-
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
308307

308+
/*
309+
* We do not have access to the sparsemem vmemmap, so we fallback to
310+
* walking the list of sparsemem blocks which we already maintain for
311+
* the sake of crashdump. In the long run, we might want to maintain
312+
* a tree if performance of that linear walk becomes a problem.
313+
*
314+
* realmode_pfn_to_page functions can fail due to:
315+
* 1) As real sparsemem blocks do not lay in RAM continously (they
316+
* are in virtual address space which is not available in the real mode),
317+
* the requested page struct can be split between blocks so get_page/put_page
318+
* may fail.
319+
* 2) When huge pages are used, the get_page/put_page API will fail
320+
* in real mode as the linked addresses in the page struct are virtual
321+
* too.
322+
*/
323+
struct page *realmode_pfn_to_page(unsigned long pfn)
324+
{
325+
struct vmemmap_backing *vmem_back;
326+
struct page *page;
327+
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
328+
unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
329+
330+
for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
331+
if (pg_va < vmem_back->virt_addr)
332+
continue;
333+
334+
/* Check that page struct is not split between real pages */
335+
if ((pg_va + sizeof(struct page)) >
336+
(vmem_back->virt_addr + page_size))
337+
return NULL;
338+
339+
page = (struct page *) (vmem_back->phys + pg_va -
340+
vmem_back->virt_addr);
341+
return page;
342+
}
343+
344+
return NULL;
345+
}
346+
EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
347+
348+
#elif defined(CONFIG_FLATMEM)
349+
350+
struct page *realmode_pfn_to_page(unsigned long pfn)
351+
{
352+
struct page *page = pfn_to_page(pfn);
353+
return page;
354+
}
355+
EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
356+
357+
#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */

include/linux/mm.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,12 +297,26 @@ static inline int put_page_testzero(struct page *page)
297297
/*
298298
* Try to grab a ref unless the page has a refcount of zero, return false if
299299
* that is the case.
300+
* This can be called when MMU is off so it must not access
301+
* any of the virtual mappings.
300302
*/
301303
static inline int get_page_unless_zero(struct page *page)
302304
{
303305
return atomic_inc_not_zero(&page->_count);
304306
}
305307

308+
/*
309+
* Try to drop a ref unless the page has a refcount of one, return false if
310+
* that is the case.
311+
* This is to make sure that the refcount won't become zero after this drop.
312+
* This can be called when MMU is off so it must not access
313+
* any of the virtual mappings.
314+
*/
315+
static inline int put_page_unless_one(struct page *page)
316+
{
317+
return atomic_add_unless(&page->_count, -1, 1);
318+
}
319+
306320
extern int page_is_ram(unsigned long pfn);
307321

308322
/* Support for virtually mapped pages */

include/linux/page-flags.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,9 @@ static inline void set_page_writeback(struct page *page)
329329
* System with lots of page flags available. This allows separate
330330
* flags for PageHead() and PageTail() checks of compound pages so that bit
331331
* tests can be used in performance sensitive paths. PageCompound is
332-
* generally not used in hot code paths.
332+
* generally not used in hot code paths except arch/powerpc/mm/init_64.c
333+
* and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
334+
* and avoid handling those in real mode.
333335
*/
334336
__PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
335337
__PAGEFLAG(Tail, tail)

0 commit comments

Comments
 (0)