Skip to content

Commit ec3604c

Browse files
committed
Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
2 parents 066dea8 + 6d4b512 commit ec3604c

File tree

31 files changed

+241
-89
lines changed

31 files changed

+241
-89
lines changed

Documentation/errseq.rst

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
The errseq_t datatype
2+
=====================
3+
An errseq_t is a way of recording errors in one place, and allowing any
4+
number of "subscribers" to tell whether it has changed since a previous
5+
point where it was sampled.
6+
7+
The initial use case for this is tracking errors for file
8+
synchronization syscalls (fsync, fdatasync, msync and sync_file_range),
9+
but it may be usable in other situations.
10+
11+
It's implemented as an unsigned 32-bit value. The low order bits are
12+
designated to hold an error code (between 1 and MAX_ERRNO). The upper bits
13+
are used as a counter. This is done with atomics instead of locking so that
14+
these functions can be called from any context.
15+
16+
Note that there is a risk of collisions if new errors are being recorded
17+
frequently, since we have so few bits to use as a counter.
18+
19+
To mitigate this, the bit between the error value and counter is used as
20+
a flag to tell whether the value has been sampled since a new value was
21+
recorded. That allows us to avoid bumping the counter if no one has
22+
sampled it since the last time an error was recorded.
23+
24+
Thus we end up with a value that looks something like this::
25+
26+
bit: 31..13 12 11..0
27+
+-----------------+----+----------------+
28+
| counter | SF | errno |
29+
+-----------------+----+----------------+
30+
31+
The general idea is for "watchers" to sample an errseq_t value and keep
32+
it as a running cursor. That value can later be used to tell whether
33+
any new errors have occurred since that sampling was done, and atomically
34+
record the state at the time that it was checked. This allows us to
35+
record errors in one place, and then have a number of "watchers" that
36+
can tell whether the value has changed since they last checked it.
37+
38+
A new errseq_t should always be zeroed out. An errseq_t value of all zeroes
39+
is the special (but common) case where there has never been an error. An all
40+
zero value thus serves as the "epoch" if one wishes to know whether there
41+
has ever been an error set since it was first initialized.
42+
43+
API usage
44+
=========
45+
Let me tell you a story about a worker drone. Now, he's a good worker
46+
overall, but the company is a little...management heavy. He has to
47+
report to 77 supervisors today, and tomorrow the "big boss" is coming in
48+
from out of town and he's sure to test the poor fellow too.
49+
50+
They're all handing him work to do -- so much he can't keep track of who
51+
handed him what, but that's not really a big problem. The supervisors
52+
just want to know when he's finished all of the work they've handed him so
53+
far and whether he made any mistakes since they last asked.
54+
55+
He might have made the mistake on work they didn't actually hand him,
56+
but he can't keep track of things at that level of detail, all he can
57+
remember is the most recent mistake that he made.
58+
59+
Here's our worker_drone representation::
60+
61+
struct worker_drone {
62+
errseq_t wd_err; /* for recording errors */
63+
};
64+
65+
Every day, the worker_drone starts out with a blank slate::
66+
67+
struct worker_drone wd;
68+
69+
wd.wd_err = (errseq_t)0;
70+
71+
The supervisors come in and get an initial read for the day. They
72+
don't care about anything that happened before their watch begins::
73+
74+
struct supervisor {
75+
errseq_t s_wd_err; /* private "cursor" for wd_err */
76+
spinlock_t s_wd_err_lock; /* protects s_wd_err */
77+
}
78+
79+
struct supervisor su;
80+
81+
su.s_wd_err = errseq_sample(&wd.wd_err);
82+
spin_lock_init(&su.s_wd_err_lock);
83+
84+
Now they start handing him tasks to do. Every few minutes they ask him to
85+
finish up all of the work they've handed him so far. Then they ask him
86+
whether he made any mistakes on any of it::
87+
88+
spin_lock(&su.su_wd_err_lock);
89+
err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
90+
spin_unlock(&su.su_wd_err_lock);
91+
92+
Up to this point, that just keeps returning 0.
93+
94+
Now, the owners of this company are quite miserly and have given him
95+
substandard equipment with which to do his job. Occasionally it
96+
glitches and he makes a mistake. He sighs a heavy sigh, and marks it
97+
down::
98+
99+
errseq_set(&wd.wd_err, -EIO);
100+
101+
...and then gets back to work. The supervisors eventually poll again
102+
and they each get the error when they next check. Subsequent calls will
103+
return 0, until another error is recorded, at which point it's reported
104+
to each of them once.
105+
106+
Note that the supervisors can't tell how many mistakes he made, only
107+
whether one was made since they last checked, and the latest value
108+
recorded.
109+
110+
Occasionally the big boss comes in for a spot check and asks the worker
111+
to do a one-off job for him. He's not really watching the worker
112+
full-time like the supervisors, but he does need to know whether a
113+
mistake occurred while his job was processing.
114+
115+
He can just sample the current errseq_t in the worker, and then use that
116+
to tell whether an error has occurred later::
117+
118+
errseq_t since = errseq_sample(&wd.wd_err);
119+
/* submit some work and wait for it to complete */
120+
err = errseq_check(&wd.wd_err, since);
121+
122+
Since he's just going to discard "since" after that point, he doesn't
123+
need to advance it here. He also doesn't need any locking since it's
124+
not usable by anyone else.
125+
126+
Serializing errseq_t cursor updates
127+
===================================
128+
Note that the errseq_t API does not protect the errseq_t cursor during a
129+
check_and_advance_operation. Only the canonical error code is handled
130+
atomically. In a situation where more than one task might be using the
131+
same errseq_t cursor at the same time, it's important to serialize
132+
updates to that cursor.
133+
134+
If that's not done, then it's possible for the cursor to go backward
135+
in which case the same error could be reported more than once.
136+
137+
Because of this, it's often advantageous to first do an errseq_check to
138+
see if anything has changed, and only later do an
139+
errseq_check_and_advance after taking the lock. e.g.::
140+
141+
if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
142+
/* su.s_wd_err is protected by s_wd_err_lock */
143+
spin_lock(&su.s_wd_err_lock);
144+
err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
145+
spin_unlock(&su.s_wd_err_lock);
146+
}
147+
148+
That avoids the spinlock in the common case where nothing has changed
149+
since the last time it was checked.

arch/powerpc/platforms/cell/spufs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1749,7 +1749,7 @@ static int spufs_mfc_flush(struct file *file, fl_owner_t id)
17491749
static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int datasync)
17501750
{
17511751
struct inode *inode = file_inode(file);
1752-
int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
1752+
int err = file_write_and_wait_range(file, start, end);
17531753
if (!err) {
17541754
inode_lock(inode);
17551755
err = spufs_mfc_flush(file, NULL);

drivers/staging/lustre/lustre/llite/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync)
23642364
PFID(ll_inode2fid(inode)), inode);
23652365
ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1);
23662366

2367-
rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
2367+
rc = file_write_and_wait_range(file, start, end);
23682368
inode_lock(inode);
23692369

23702370
/* catch async errors that were recorded back when async writeback

drivers/video/fbdev/core/fb_defio.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasy
6969
{
7070
struct fb_info *info = file->private_data;
7171
struct inode *inode = file_inode(file);
72-
int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
72+
int err = file_write_and_wait_range(file, start, end);
7373
if (err)
7474
return err;
7575

fs/9p/vfs_file.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t start, loff_t end,
445445
struct p9_wstat wstat;
446446
int retval;
447447

448-
retval = filemap_write_and_wait_range(inode->i_mapping, start, end);
448+
retval = file_write_and_wait_range(filp, start, end);
449449
if (retval)
450450
return retval;
451451

@@ -468,7 +468,7 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, loff_t end,
468468
struct inode *inode = filp->f_mapping->host;
469469
int retval;
470470

471-
retval = filemap_write_and_wait_range(inode->i_mapping, start, end);
471+
retval = file_write_and_wait_range(filp, start, end);
472472
if (retval)
473473
return retval;
474474

fs/affs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -954,7 +954,7 @@ int affs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
954954
struct inode *inode = filp->f_mapping->host;
955955
int ret, err;
956956

957-
err = filemap_write_and_wait_range(inode->i_mapping, start, end);
957+
err = file_write_and_wait_range(filp, start, end);
958958
if (err)
959959
return err;
960960

fs/afs/write.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -714,7 +714,7 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
714714
vnode->fid.vid, vnode->fid.vnode, file,
715715
datasync);
716716

717-
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
717+
ret = file_write_and_wait_range(file, start, end);
718718
if (ret)
719719
return ret;
720720
inode_lock(inode);

fs/cifs/file.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2329,7 +2329,7 @@ int cifs_strict_fsync(struct file *file, loff_t start, loff_t end,
23292329
struct inode *inode = file_inode(file);
23302330
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
23312331

2332-
rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
2332+
rc = file_write_and_wait_range(file, start, end);
23332333
if (rc)
23342334
return rc;
23352335
inode_lock(inode);
@@ -2371,7 +2371,7 @@ int cifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
23712371
struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file);
23722372
struct inode *inode = file->f_mapping->host;
23732373

2374-
rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
2374+
rc = file_write_and_wait_range(file, start, end);
23752375
if (rc)
23762376
return rc;
23772377
inode_lock(inode);

fs/ecryptfs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -328,7 +328,7 @@ ecryptfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
328328
{
329329
int rc;
330330

331-
rc = filemap_write_and_wait(file->f_mapping);
331+
rc = file_write_and_wait(file);
332332
if (rc)
333333
return rc;
334334

fs/exofs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ static int exofs_file_fsync(struct file *filp, loff_t start, loff_t end,
4848
struct inode *inode = filp->f_mapping->host;
4949
int ret;
5050

51-
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
51+
ret = file_write_and_wait_range(filp, start, end);
5252
if (ret)
5353
return ret;
5454

fs/f2fs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
206206
/* if fdatasync is triggered, let's do in-place-update */
207207
if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
208208
set_inode_flag(inode, FI_NEED_IPU);
209-
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
209+
ret = file_write_and_wait_range(file, start, end);
210210
clear_inode_flag(inode, FI_NEED_IPU);
211211

212212
if (ret) {

fs/fuse/file.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -457,18 +457,18 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
457457
* wait for all outstanding writes, before sending the FSYNC
458458
* request.
459459
*/
460-
err = filemap_write_and_wait_range(inode->i_mapping, start, end);
460+
err = file_write_and_wait_range(file, start, end);
461461
if (err)
462462
goto out;
463463

464464
fuse_sync_writes(inode);
465465

466466
/*
467467
* Due to implementation of fuse writeback
468-
* filemap_write_and_wait_range() does not catch errors.
468+
* file_write_and_wait_range() does not catch errors.
469469
* We have to do this directly after fuse_sync_writes()
470470
*/
471-
err = filemap_check_errors(file->f_mapping);
471+
err = file_check_and_advance_wb_err(file);
472472
if (err)
473473
goto out;
474474

fs/gfs2/file.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
668668
if (ret)
669669
return ret;
670670
if (gfs2_is_jdata(ip))
671-
filemap_write_and_wait(mapping);
671+
ret = file_write_and_wait(file);
672+
if (ret)
673+
return ret;
672674
gfs2_ail_flush(ip->i_gl, 1);
673675
}
674676

675677
if (mapping->nrpages)
676-
ret = filemap_fdatawait_range(mapping, start, end);
678+
ret = file_fdatawait_range(file, start, end);
677679

678680
return ret ? ret : ret1;
679681
}

fs/hfs/inode.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -656,7 +656,7 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
656656
struct super_block * sb;
657657
int ret, err;
658658

659-
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
659+
ret = file_write_and_wait_range(filp, start, end);
660660
if (ret)
661661
return ret;
662662
inode_lock(inode);

fs/hfsplus/inode.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,7 +283,7 @@ int hfsplus_file_fsync(struct file *file, loff_t start, loff_t end,
283283
struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
284284
int error = 0, error2;
285285

286-
error = filemap_write_and_wait_range(inode->i_mapping, start, end);
286+
error = file_write_and_wait_range(file, start, end);
287287
if (error)
288288
return error;
289289
inode_lock(inode);

fs/hostfs/hostfs_kern.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ static int hostfs_fsync(struct file *file, loff_t start, loff_t end,
374374
struct inode *inode = file->f_mapping->host;
375375
int ret;
376376

377-
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
377+
ret = file_write_and_wait_range(file, start, end);
378378
if (ret)
379379
return ret;
380380

fs/hpfs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ int hpfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
2424
struct inode *inode = file->f_mapping->host;
2525
int ret;
2626

27-
ret = filemap_write_and_wait_range(file->f_mapping, start, end);
27+
ret = file_write_and_wait_range(file, start, end);
2828
if (ret)
2929
return ret;
3030
return sync_blockdev(inode->i_sb->s_bdev);

fs/jffs2/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ int jffs2_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
3535
struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
3636
int ret;
3737

38-
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
38+
ret = file_write_and_wait_range(filp, start, end);
3939
if (ret)
4040
return ret;
4141

fs/jfs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ int jfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
3434
struct inode *inode = file->f_mapping->host;
3535
int rc = 0;
3636

37-
rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
37+
rc = file_write_and_wait_range(file, start, end);
3838
if (rc)
3939
return rc;
4040

fs/ncpfs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323

2424
static int ncp_fsync(struct file *file, loff_t start, loff_t end, int datasync)
2525
{
26-
return filemap_write_and_wait_range(file->f_mapping, start, end);
26+
return file_write_and_wait_range(file, start, end);
2727
}
2828

2929
/*

fs/ntfs/dir.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1506,7 +1506,7 @@ static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
15061506

15071507
ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
15081508

1509-
err = filemap_write_and_wait_range(vi->i_mapping, start, end);
1509+
err = file_write_and_wait_range(filp, start, end);
15101510
if (err)
15111511
return err;
15121512
inode_lock(vi);

fs/ntfs/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1989,7 +1989,7 @@ static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
19891989

19901990
ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
19911991

1992-
err = filemap_write_and_wait_range(vi->i_mapping, start, end);
1992+
err = file_write_and_wait_range(filp, start, end);
19931993
if (err)
19941994
return err;
19951995
inode_lock(vi);

fs/ocfs2/file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ static int ocfs2_sync_file(struct file *file, loff_t start, loff_t end,
196196
if (ocfs2_is_hard_readonly(osb) || ocfs2_is_soft_readonly(osb))
197197
return -EROFS;
198198

199-
err = filemap_write_and_wait_range(inode->i_mapping, start, end);
199+
err = file_write_and_wait_range(file, start, end);
200200
if (err)
201201
return err;
202202

fs/reiserfs/dir.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ static int reiserfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
3434
struct inode *inode = filp->f_mapping->host;
3535
int err;
3636

37-
err = filemap_write_and_wait_range(inode->i_mapping, start, end);
37+
err = file_write_and_wait_range(filp, start, end);
3838
if (err)
3939
return err;
4040

0 commit comments

Comments
 (0)