Skip to content

Commit ccaafd7

Browse files
JoonsooKimtorvalds
authored andcommitted
mm: don't use compound_head() in virt_to_head_page()
compound_head() is implemented with assumption that there would be race condition when checking tail flag. This assumption is only true when we try to access arbitrary positioned struct page. The situation that virt_to_head_page() is called is different case. We call virt_to_head_page() only in the range of allocated pages, so there is no race condition on tail flag. In this case, we don't need to handle race condition and we can reduce overhead slightly. This patch implements compound_head_fast() which is similar with compound_head() except tail flag race handling. And then, virt_to_head_page() uses this optimized function to improve performance. I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063 ns -> 13.810 ns) if target object is on tail page. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 9aabf81 commit ccaafd7

File tree

1 file changed

+26
-1
lines changed

1 file changed

+26
-1
lines changed

include/linux/mm.h

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -446,13 +446,31 @@ static inline struct page *compound_head_by_tail(struct page *tail)
446446
return tail;
447447
}
448448

449+
/*
450+
* Since either compound page could be dismantled asynchronously in THP
451+
* or we access asynchronously arbitrary positioned struct page, there
452+
* would be tail flag race. To handle this race, we should call
453+
* smp_rmb() before checking tail flag. compound_head_by_tail() did it.
454+
*/
449455
static inline struct page *compound_head(struct page *page)
450456
{
451457
if (unlikely(PageTail(page)))
452458
return compound_head_by_tail(page);
453459
return page;
454460
}
455461

462+
/*
463+
* If we access compound page synchronously such as access to
464+
* allocated page, there is no need to handle tail flag race, so we can
465+
* check tail flag directly without any synchronization primitive.
466+
*/
467+
static inline struct page *compound_head_fast(struct page *page)
468+
{
469+
if (unlikely(PageTail(page)))
470+
return page->first_page;
471+
return page;
472+
}
473+
456474
/*
457475
* The atomic page->_mapcount, starts from -1: so that transitions
458476
* both from it and to it can be tracked, using atomic_inc_and_test
@@ -531,7 +549,14 @@ static inline void get_page(struct page *page)
531549
static inline struct page *virt_to_head_page(const void *x)
532550
{
533551
struct page *page = virt_to_page(x);
534-
return compound_head(page);
552+
553+
/*
554+
* We don't need to worry about synchronization of tail flag
555+
* when we call virt_to_head_page() since it is only called for
556+
* already allocated page and this page won't be freed until
557+
* this virt_to_head_page() is finished. So use _fast variant.
558+
*/
559+
return compound_head_fast(page);
535560
}
536561

537562
/*

0 commit comments

Comments
 (0)