Skip to content

Commit 56a8c8e

Browse files
Hugh Dickinstorvalds
authored andcommitted
tmpfs: do not allocate pages on read
Mikulas asked in "Do we still need commit a0ee5ec ('tmpfs: allocate on read when stacked')?" in [1] Lukas noticed this unusual behavior of loop device backed by tmpfs in [2]. Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes; but if it looks like it might be a read for "a stacking filesystem", it allocates actual pages to the page cache, and even marks them as dirty. And reads from the loop device do satisfy the test that is used. This oddity was added for an old version of unionfs, to help to limit its usage to the limited size of the tmpfs mount involved; but about the same time as the tmpfs mod went in (2.6.25), unionfs was reworked to proceed differently; and the mod kept just in case others needed it. Do we still need it? I cannot answer with more certainty than "Probably not". It's nasty enough that we really should try to delete it; but if a regression is reported somewhere, then we might have to revert later. It's not quite as simple as just removing the test (as Mikulas did): xfstests generic/013 hung because splice from tmpfs failed on page not up-to-date and page mapping unset. That can be fixed just by marking the ZERO_PAGE as Uptodate, which of course it is: do so in pagecache_init() - it might be useful to others than tmpfs. My intention, though, was to stop using the ZERO_PAGE here altogether: surely iov_iter_zero() is better for this case? Sadly not: it relies on clear_user(), and the x86 clear_user() is slower than its copy_user() [3]. But while we are still using the ZERO_PAGE, let's stop dirtying its struct page cacheline with unnecessary get_page() and put_page(). Link: https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/ [1] Link: https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ [2] Link: https://lore.kernel.org/lkml/[email protected]/ [3] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Mikulas Patocka <[email protected]> Reported-by: Lukas Czerner <[email protected]> Acked-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Zdenek Kabelac <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Miklos Szeredi <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent bc78639 commit 56a8c8e

File tree

2 files changed

+12
-14
lines changed

2 files changed

+12
-14
lines changed

mm/filemap.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1054,6 +1054,12 @@ void __init pagecache_init(void)
10541054
init_waitqueue_head(&folio_wait_table[i]);
10551055

10561056
page_writeback_init();
1057+
1058+
/*
1059+
* tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date,
1060+
* and splice's page_cache_pipe_buf_confirm() needs to see that.
1061+
*/
1062+
SetPageUptodate(ZERO_PAGE(0));
10571063
}
10581064

10591065
/*

mm/shmem.c

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2499,19 +2499,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
24992499
struct address_space *mapping = inode->i_mapping;
25002500
pgoff_t index;
25012501
unsigned long offset;
2502-
enum sgp_type sgp = SGP_READ;
25032502
int error = 0;
25042503
ssize_t retval = 0;
25052504
loff_t *ppos = &iocb->ki_pos;
25062505

2507-
/*
2508-
* Might this read be for a stacking filesystem? Then when reading
2509-
* holes of a sparse file, we actually need to allocate those pages,
2510-
* and even mark them dirty, so it cannot exceed the max_blocks limit.
2511-
*/
2512-
if (!iter_is_iovec(to))
2513-
sgp = SGP_CACHE;
2514-
25152506
index = *ppos >> PAGE_SHIFT;
25162507
offset = *ppos & ~PAGE_MASK;
25172508

@@ -2520,6 +2511,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
25202511
pgoff_t end_index;
25212512
unsigned long nr, ret;
25222513
loff_t i_size = i_size_read(inode);
2514+
bool got_page;
25232515

25242516
end_index = i_size >> PAGE_SHIFT;
25252517
if (index > end_index)
@@ -2530,15 +2522,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
25302522
break;
25312523
}
25322524

2533-
error = shmem_getpage(inode, index, &page, sgp);
2525+
error = shmem_getpage(inode, index, &page, SGP_READ);
25342526
if (error) {
25352527
if (error == -EINVAL)
25362528
error = 0;
25372529
break;
25382530
}
25392531
if (page) {
2540-
if (sgp == SGP_CACHE)
2541-
set_page_dirty(page);
25422532
unlock_page(page);
25432533

25442534
if (PageHWPoison(page)) {
@@ -2578,9 +2568,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
25782568
*/
25792569
if (!offset)
25802570
mark_page_accessed(page);
2571+
got_page = true;
25812572
} else {
25822573
page = ZERO_PAGE(0);
2583-
get_page(page);
2574+
got_page = false;
25842575
}
25852576

25862577
/*
@@ -2593,7 +2584,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
25932584
index += offset >> PAGE_SHIFT;
25942585
offset &= ~PAGE_MASK;
25952586

2596-
put_page(page);
2587+
if (got_page)
2588+
put_page(page);
25972589
if (!iov_iter_count(to))
25982590
break;
25992591
if (ret < nr) {

0 commit comments

Comments
 (0)