Skip to content

Commit 0e06f5c

Browse files
committed
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <[email protected]>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
2 parents f7816ad + 8f19b0c commit 0e06f5c

File tree

186 files changed

+7414
-4185
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

186 files changed

+7414
-4185
lines changed

Documentation/blockdev/zram.txt

Lines changed: 46 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -59,62 +59,72 @@ num_devices parameter is optional and tells zram how many devices should be
5959
pre-created. Default: 1.
6060

6161
2) Set max number of compression streams
62-
Regardless the value passed to this attribute, ZRAM will always
63-
allocate multiple compression streams - one per online CPUs - thus
64-
allowing several concurrent compression operations. The number of
65-
allocated compression streams goes down when some of the CPUs
66-
become offline. There is no single-compression-stream mode anymore,
67-
unless you are running a UP system or has only 1 CPU online.
68-
69-
To find out how many streams are currently available:
62+
Regardless the value passed to this attribute, ZRAM will always
63+
allocate multiple compression streams - one per online CPUs - thus
64+
allowing several concurrent compression operations. The number of
65+
allocated compression streams goes down when some of the CPUs
66+
become offline. There is no single-compression-stream mode anymore,
67+
unless you are running a UP system or has only 1 CPU online.
68+
69+
To find out how many streams are currently available:
7070
cat /sys/block/zram0/max_comp_streams
7171

7272
3) Select compression algorithm
73-
Using comp_algorithm device attribute one can see available and
74-
currently selected (shown in square brackets) compression algorithms,
75-
change selected compression algorithm (once the device is initialised
76-
there is no way to change compression algorithm).
73+
Using comp_algorithm device attribute one can see available and
74+
currently selected (shown in square brackets) compression algorithms,
75+
change selected compression algorithm (once the device is initialised
76+
there is no way to change compression algorithm).
7777

78-
Examples:
78+
Examples:
7979
#show supported compression algorithms
8080
cat /sys/block/zram0/comp_algorithm
8181
lzo [lz4]
8282

8383
#select lzo compression algorithm
8484
echo lzo > /sys/block/zram0/comp_algorithm
8585

86+
For the time being, the `comp_algorithm' content does not necessarily
87+
show every compression algorithm supported by the kernel. We keep this
88+
list primarily to simplify device configuration and one can configure
89+
a new device with a compression algorithm that is not listed in
90+
`comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API
91+
and, if some of the algorithms were built as modules, it's impossible
92+
to list all of them using, for instance, /proc/crypto or any other
93+
method. This, however, has an advantage of permitting the usage of
94+
custom crypto compression modules (implementing S/W or H/W compression).
95+
8696
4) Set Disksize
87-
Set disk size by writing the value to sysfs node 'disksize'.
88-
The value can be either in bytes or you can use mem suffixes.
89-
Examples:
90-
# Initialize /dev/zram0 with 50MB disksize
91-
echo $((50*1024*1024)) > /sys/block/zram0/disksize
97+
Set disk size by writing the value to sysfs node 'disksize'.
98+
The value can be either in bytes or you can use mem suffixes.
99+
Examples:
100+
# Initialize /dev/zram0 with 50MB disksize
101+
echo $((50*1024*1024)) > /sys/block/zram0/disksize
92102

93-
# Using mem suffixes
94-
echo 256K > /sys/block/zram0/disksize
95-
echo 512M > /sys/block/zram0/disksize
96-
echo 1G > /sys/block/zram0/disksize
103+
# Using mem suffixes
104+
echo 256K > /sys/block/zram0/disksize
105+
echo 512M > /sys/block/zram0/disksize
106+
echo 1G > /sys/block/zram0/disksize
97107

98108
Note:
99109
There is little point creating a zram of greater than twice the size of memory
100110
since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
101111
size of the disk when not in use so a huge zram is wasteful.
102112

103113
5) Set memory limit: Optional
104-
Set memory limit by writing the value to sysfs node 'mem_limit'.
105-
The value can be either in bytes or you can use mem suffixes.
106-
In addition, you could change the value in runtime.
107-
Examples:
108-
# limit /dev/zram0 with 50MB memory
109-
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
110-
111-
# Using mem suffixes
112-
echo 256K > /sys/block/zram0/mem_limit
113-
echo 512M > /sys/block/zram0/mem_limit
114-
echo 1G > /sys/block/zram0/mem_limit
115-
116-
# To disable memory limit
117-
echo 0 > /sys/block/zram0/mem_limit
114+
Set memory limit by writing the value to sysfs node 'mem_limit'.
115+
The value can be either in bytes or you can use mem suffixes.
116+
In addition, you could change the value in runtime.
117+
Examples:
118+
# limit /dev/zram0 with 50MB memory
119+
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
120+
121+
# Using mem suffixes
122+
echo 256K > /sys/block/zram0/mem_limit
123+
echo 512M > /sys/block/zram0/mem_limit
124+
echo 1G > /sys/block/zram0/mem_limit
125+
126+
# To disable memory limit
127+
echo 0 > /sys/block/zram0/mem_limit
118128

119129
6) Activate:
120130
mkswap /dev/zram0

Documentation/filesystems/Locking

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,9 @@ prototypes:
195195
int (*releasepage) (struct page *, int);
196196
void (*freepage)(struct page *);
197197
int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
198+
bool (*isolate_page) (struct page *, isolate_mode_t);
198199
int (*migratepage)(struct address_space *, struct page *, struct page *);
200+
void (*putback_page) (struct page *);
199201
int (*launder_page)(struct page *);
200202
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
201203
int (*error_remove_page)(struct address_space *, struct page *);
@@ -219,7 +221,9 @@ invalidatepage: yes
219221
releasepage: yes
220222
freepage: yes
221223
direct_IO:
224+
isolate_page: yes
222225
migratepage: yes (both)
226+
putback_page: yes
223227
launder_page: yes
224228
is_partially_uptodate: yes
225229
error_remove_page: yes
@@ -544,13 +548,13 @@ subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
544548
locked. The VM will unlock the page.
545549

546550
->map_pages() is called when VM asks to map easy accessible pages.
547-
Filesystem should find and map pages associated with offsets from "pgoff"
548-
till "max_pgoff". ->map_pages() is called with page table locked and must
551+
Filesystem should find and map pages associated with offsets from "start_pgoff"
552+
till "end_pgoff". ->map_pages() is called with page table locked and must
549553
not block. If it's not possible to reach a page without blocking,
550554
filesystem should skip it. Filesystem should use do_set_pte() to setup
551-
page table entry. Pointer to entry associated with offset "pgoff" is
552-
passed in "pte" field in vm_fault structure. Pointers to entries for other
553-
offsets should be calculated relative to "pte".
555+
page table entry. Pointer to entry associated with the page is passed in
556+
"pte" field in fault_env structure. Pointers to entries for other offsets
557+
should be calculated relative to "pte".
554558

555559
->page_mkwrite() is called when a previously read-only pte is
556560
about to become writeable. The filesystem again must ensure that there are

Documentation/filesystems/dax.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ These block devices may be used for inspiration:
4949
- axonram: Axon DDR2 device driver
5050
- brd: RAM backed block device driver
5151
- dcssblk: s390 dcss block device driver
52+
- pmem: NVDIMM persistent memory driver
5253

5354

5455
Implementation Tips for Filesystem Writers
@@ -75,8 +76,9 @@ calls to get_block() (for example by a page-fault racing with a read()
7576
or a write()) work correctly.
7677

7778
These filesystems may be used for inspiration:
78-
- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt
79-
- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt
79+
- ext2: see Documentation/filesystems/ext2.txt
80+
- ext4: see Documentation/filesystems/ext4.txt
81+
- xfs: see Documentation/filesystems/xfs.txt
8082

8183

8284
Handling Media Errors

Documentation/filesystems/proc.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,7 @@ Private_Dirty: 0 kB
436436
Referenced: 892 kB
437437
Anonymous: 0 kB
438438
AnonHugePages: 0 kB
439+
ShmemPmdMapped: 0 kB
439440
Shared_Hugetlb: 0 kB
440441
Private_Hugetlb: 0 kB
441442
Swap: 0 kB
@@ -464,6 +465,8 @@ accessed.
464465
a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
465466
and a page is modified, the file page is replaced by a private anonymous copy.
466467
"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
468+
"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
469+
huge pages.
467470
"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
468471
hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
469472
reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
@@ -868,6 +871,9 @@ VmallocTotal: 112216 kB
868871
VmallocUsed: 428 kB
869872
VmallocChunk: 111088 kB
870873
AnonHugePages: 49152 kB
874+
ShmemHugePages: 0 kB
875+
ShmemPmdMapped: 0 kB
876+
871877

872878
MemTotal: Total usable ram (i.e. physical ram minus a few reserved
873879
bits and the kernel binary code)
@@ -912,6 +918,9 @@ MemAvailable: An estimate of how much memory is available for starting new
912918
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
913919
Mapped: files which have been mmaped, such as libraries
914920
Shmem: Total memory used by shared memory (shmem) and tmpfs
921+
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
922+
with huge pages
923+
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
915924
Slab: in-kernel data structures cache
916925
SReclaimable: Part of Slab, that might be reclaimed, such as caches
917926
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure

Documentation/filesystems/vfs.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,9 +592,14 @@ struct address_space_operations {
592592
int (*releasepage) (struct page *, int);
593593
void (*freepage)(struct page *);
594594
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
595+
/* isolate a page for migration */
596+
bool (*isolate_page) (struct page *, isolate_mode_t);
595597
/* migrate the contents of a page to the specified target */
596598
int (*migratepage) (struct page *, struct page *);
599+
/* put migration-failed page back to right list */
600+
void (*putback_page) (struct page *);
597601
int (*launder_page) (struct page *);
602+
598603
int (*is_partially_uptodate) (struct page *, unsigned long,
599604
unsigned long);
600605
void (*is_dirty_writeback) (struct page *, bool *, bool *);
@@ -747,13 +752,19 @@ struct address_space_operations {
747752
and transfer data directly between the storage and the
748753
application's address space.
749754

755+
isolate_page: Called by the VM when isolating a movable non-lru page.
756+
If page is successfully isolated, VM marks the page as PG_isolated
757+
via __SetPageIsolated.
758+
750759
migrate_page: This is used to compact the physical memory usage.
751760
If the VM wants to relocate a page (maybe off a memory card
752761
that is signalling imminent failure) it will pass a new page
753762
and an old page to this function. migrate_page should
754763
transfer any private data across and update any references
755764
that it has to the page.
756765

766+
putback_page: Called by the VM when isolated page's migration fails.
767+
757768
launder_page: Called before freeing a page - it writes back the dirty page. To
758769
prevent redirtying the page, it is kept locked during the whole
759770
operation.

Documentation/vm/page_migration

Lines changed: 107 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,5 +142,111 @@ Steps:
142142
20. The new page is moved to the LRU and can be scanned by the swapper
143143
etc again.
144144

145-
Christoph Lameter, May 8, 2006.
145+
C. Non-LRU page migration
146+
-------------------------
147+
148+
Although original migration aimed for reducing the latency of memory access
149+
for NUMA, compaction who want to create high-order page is also main customer.
150+
151+
Current problem of the implementation is that it is designed to migrate only
152+
*LRU* pages. However, there are potential non-lru pages which can be migrated
153+
in drivers, for example, zsmalloc, virtio-balloon pages.
154+
155+
For virtio-balloon pages, some parts of migration code path have been hooked
156+
up and added virtio-balloon specific functions to intercept migration logics.
157+
It's too specific to a driver so other drivers who want to make their pages
158+
movable would have to add own specific hooks in migration path.
159+
160+
To overclome the problem, VM supports non-LRU page migration which provides
161+
generic functions for non-LRU movable pages without driver specific hooks
162+
migration path.
163+
164+
If a driver want to make own pages movable, it should define three functions
165+
which are function pointers of struct address_space_operations.
166+
167+
1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
168+
169+
What VM expects on isolate_page function of driver is to return *true*
170+
if driver isolates page successfully. On returing true, VM marks the page
171+
as PG_isolated so concurrent isolation in several CPUs skip the page
172+
for isolation. If a driver cannot isolate the page, it should return *false*.
173+
174+
Once page is successfully isolated, VM uses page.lru fields so driver
175+
shouldn't expect to preserve values in that fields.
176+
177+
2. int (*migratepage) (struct address_space *mapping,
178+
struct page *newpage, struct page *oldpage, enum migrate_mode);
179+
180+
After isolation, VM calls migratepage of driver with isolated page.
181+
The function of migratepage is to move content of the old page to new page
182+
and set up fields of struct page newpage. Keep in mind that you should
183+
indicate to the VM the oldpage is no longer movable via __ClearPageMovable()
184+
under page_lock if you migrated the oldpage successfully and returns
185+
MIGRATEPAGE_SUCCESS. If driver cannot migrate the page at the moment, driver
186+
can return -EAGAIN. On -EAGAIN, VM will retry page migration in a short time
187+
because VM interprets -EAGAIN as "temporal migration failure". On returning
188+
any error except -EAGAIN, VM will give up the page migration without retrying
189+
in this time.
190+
191+
Driver shouldn't touch page.lru field VM using in the functions.
192+
193+
3. void (*putback_page)(struct page *);
194+
195+
If migration fails on isolated page, VM should return the isolated page
196+
to the driver so VM calls driver's putback_page with migration failed page.
197+
In this function, driver should put the isolated page back to the own data
198+
structure.
146199

200+
4. non-lru movable page flags
201+
202+
There are two page flags for supporting non-lru movable page.
203+
204+
* PG_movable
205+
206+
Driver should use the below function to make page movable under page_lock.
207+
208+
void __SetPageMovable(struct page *page, struct address_space *mapping)
209+
210+
It needs argument of address_space for registering migration family functions
211+
which will be called by VM. Exactly speaking, PG_movable is not a real flag of
212+
struct page. Rather than, VM reuses page->mapping's lower bits to represent it.
213+
214+
#define PAGE_MAPPING_MOVABLE 0x2
215+
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
216+
217+
so driver shouldn't access page->mapping directly. Instead, driver should
218+
use page_mapping which mask off the low two bits of page->mapping under
219+
page lock so it can get right struct address_space.
220+
221+
For testing of non-lru movable page, VM supports __PageMovable function.
222+
However, it doesn't guarantee to identify non-lru movable page because
223+
page->mapping field is unified with other variables in struct page.
224+
As well, if driver releases the page after isolation by VM, page->mapping
225+
doesn't have stable value although it has PAGE_MAPPING_MOVABLE
226+
(Look at __ClearPageMovable). But __PageMovable is cheap to catch whether
227+
page is LRU or non-lru movable once the page has been isolated. Because
228+
LRU pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
229+
good for just peeking to test non-lru movable pages before more expensive
230+
checking with lock_page in pfn scanning to select victim.
231+
232+
For guaranteeing non-lru movable page, VM provides PageMovable function.
233+
Unlike __PageMovable, PageMovable functions validates page->mapping and
234+
mapping->a_ops->isolate_page under lock_page. The lock_page prevents sudden
235+
destroying of page->mapping.
236+
237+
Driver using __SetPageMovable should clear the flag via __ClearMovablePage
238+
under page_lock before the releasing the page.
239+
240+
* PG_isolated
241+
242+
To prevent concurrent isolation among several CPUs, VM marks isolated page
243+
as PG_isolated under lock_page. So if a CPU encounters PG_isolated non-lru
244+
movable page, it can skip it. Driver doesn't need to manipulate the flag
245+
because VM will set/clear it automatically. Keep in mind that if driver
246+
sees PG_isolated page, it means the page have been isolated by VM so it
247+
shouldn't touch page.lru field.
248+
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
249+
for own purpose.
250+
251+
Christoph Lameter, May 8, 2006.
252+
Minchan Kim, Mar 28, 2016.

0 commit comments

Comments
 (0)