Skip to content

Commit 3a6541e

Browse files
jankaratytso
authored andcommitted
ext4: Orphan file documentation
Add documentation about the orphan file feature. Reviewed-by: Theodore Ts'o <[email protected]> Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
1 parent 02f310f commit 3a6541e

File tree

5 files changed

+89
-6
lines changed

5 files changed

+89
-6
lines changed

Documentation/filesystems/ext4/globals.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ have static metadata at fixed locations.
1111
.. include:: bitmaps.rst
1212
.. include:: mmp.rst
1313
.. include:: journal.rst
14+
.. include:: orphan.rst

Documentation/filesystems/ext4/inodes.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data
498498
modification time (mtime), and deletion time (dtime). The four fields
499499
are 32-bit signed integers that represent seconds since the Unix epoch
500500
(1970-01-01 00:00:00 GMT), which means that the fields will overflow in
501-
January 2038. For inodes that are not linked from any directory but are
502-
still open (orphan inodes), the dtime field is overloaded for use with
503-
the orphan list. The superblock field ``s_last_orphan`` points to the
504-
first inode in the orphan list; dtime is then the number of the next
505-
orphaned inode, or zero if there are no more orphans.
501+
January 2038. If the filesystem does not have orphan_file feature, inodes
502+
that are not linked from any directory but are still open (orphan inodes) have
503+
the dtime field overloaded for use with the orphan list. The superblock field
504+
``s_last_orphan`` points to the first inode in the orphan list; dtime is then
505+
the number of the next orphaned inode, or zero if there are no more orphans.
506506

507507
If the inode structure size ``sb->s_inode_size`` is larger than 128
508508
bytes and the ``i_inode_extra`` field is large enough to encompass the
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
Orphan file
4+
-----------
5+
6+
In unix there can inodes that are unlinked from directory hierarchy but that
7+
are still alive because they are open. In case of crash the filesystem has to
8+
clean up these inodes as otherwise they (and the blocks referenced from them)
9+
would leak. Similarly if we truncate or extend the file, we need not be able
10+
to perform the operation in a single journalling transaction. In such case we
11+
track the inode as orphan so that in case of crash extra blocks allocated to
12+
the file get truncated.
13+
14+
Traditionally ext4 tracks orphan inodes in a form of single linked list where
15+
superblock contains the inode number of the last orphan inode (s\_last\_orphan
16+
field) and then each inode contains inode number of the previously orphaned
17+
inode (we overload i\_dtime inode field for this). However this filesystem
18+
global single linked list is a scalability bottleneck for workloads that result
19+
in heavy creation of orphan inodes. When orphan file feature
20+
(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode
21+
(referenced from the superblock through s\_orphan_file_inum) with several
22+
blocks. Each of these blocks has a structure:
23+
24+
.. list-table::
25+
:widths: 8 8 24 40
26+
:header-rows: 1
27+
28+
* - Offset
29+
- Type
30+
- Name
31+
- Description
32+
* - 0x0
33+
- Array of \_\_le32 entries
34+
- Orphan inode entries
35+
- Each \_\_le32 entry is either empty (0) or it contains inode number of
36+
an orphan inode.
37+
* - blocksize - 8
38+
- \_\_le32
39+
- ob\_magic
40+
- Magic value stored in orphan block tail (0x0b10ca04)
41+
* - blocksize - 4
42+
- \_\_le32
43+
- ob\_checksum
44+
- Checksum of the orphan block.
45+
46+
When a filesystem with orphan file feature is writeably mounted, we set
47+
RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may
48+
be valid orphan entries. In case we see this feature when mounting the
49+
filesystem, we read the whole orphan file and process all orphan inodes found
50+
there as usual. When cleanly unmounting the filesystem we remove the
51+
RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan
52+
file and also make the filesystem fully compatible with older kernels.

Documentation/filesystems/ext4/special_inodes.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows:
3636
* - 11
3737
- Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.
3838

39+
Note that there are also some inodes allocated from non-reserved inode numbers
40+
for other filesystem features which are not referenced from standard directory
41+
hierarchy. These are generally reference from the superblock. They are:
42+
43+
.. list-table::
44+
:widths: 20 50
45+
:header-rows: 1
46+
47+
* - Superblock field
48+
- Description
49+
50+
* - s\_lpf\_ino
51+
- Inode number of lost+found directory.
52+
* - s\_prj\_quota\_inum
53+
- Inode number of quota file tracking project quotas
54+
* - s\_orphan\_file\_inum
55+
- Inode number of file tracking orphan inodes.

Documentation/filesystems/ext4/super.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in
479479
- Filename charset encoding flags.
480480
* - 0x280
481481
- \_\_le32
482-
- s\_reserved[95]
482+
- s\_orphan\_file\_inum
483+
- Orphan file inode number.
484+
* - 0x284
485+
- \_\_le32
486+
- s\_reserved[94]
483487
- Padding to the end of the block.
484488
* - 0x3FC
485489
- \_\_le32
@@ -603,6 +607,11 @@ following:
603607
the journal, JBD2 incompat feature
604608
(JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
605609
set (COMPAT\_FAST\_COMMIT).
610+
* - 0x1000
611+
- Orphan file allocated. This is the special file for more efficient
612+
tracking of unlinked but still open inodes. When there may be any
613+
entries in the file, we additionally set proper rocompat feature
614+
(RO\_COMPAT\_ORPHAN\_PRESENT).
606615

607616
.. _super_incompat:
608617

@@ -713,6 +722,10 @@ the following:
713722
- Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
714723
* - 0x8000
715724
- Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY)
725+
* - 0x10000
726+
- Indicates orphan file may have valid orphan entries and thus we need
727+
to clean them up when mounting the filesystem
728+
(RO\_COMPAT\_ORPHAN\_PRESENT).
716729

717730
.. _super_def_hash:
718731

0 commit comments

Comments
 (0)