Skip to content

Commit 9fe55ee

Browse files
swhitehoAl Viro
authored andcommitted
Fix race when checking i_size on direct i/o read
So far I've had one ACK for this, and no other comments. So I think it is probably time to send this via some suitable tree. I'm guessing that the vfs tree would be the most appropriate route, but not sure that there is one at the moment (don't see anything recent at kernel.org) so in that case I think -mm is the "back up plan". Al, please let me know if you will take this? Steve. --------------------- Following on from the "Re: [PATCH v3] vfs: fix a bug when we do some dio reads with append dio writes" thread on linux-fsdevel, this patch is my current version of the fix proposed as option (b) in that thread. Removing the i_size test from the direct i/o read path at vfs level means that filesystems now have to deal with requests which are beyond i_size themselves. These I've divided into three sets: a) Those with "no op" ->direct_IO (9p, cifs, ceph) These are obviously not going to be an issue b) Those with "home brew" ->direct_IO (nfs, fuse) I've been told that NFS should not have any problem with the larger i_size, however I've added an extra test to FUSE to duplicate the original behaviour just to be on the safe side. c) Those using __blockdev_direct_IO() These call through to ->get_block() which should deal with the EOF condition correctly. I've verified that with GFS2 and I believe that Zheng has verified it for ext4. I've also run the test on XFS and it passes both before and after this change. The part of the patch in filemap.c looks a lot larger than it really is - there are only two lines of real change. The rest is just indentation of the contained code. There remains a test of i_size though, which was added for btrfs. It doesn't cause the other filesystems a problem as the test is performed after ->direct_IO has been called. It is possible that there is a race that does matter to btrfs, however this patch doesn't change that, so its still an overall improvement. Signed-off-by: Steven Whitehouse <[email protected]> Reported-by: Zheng Liu <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dave Chinner <[email protected]> Acked-by: Miklos Szeredi <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Al Viro <[email protected]>
1 parent 2796e4c commit 9fe55ee

File tree

2 files changed

+23
-22
lines changed

2 files changed

+23
-22
lines changed

fs/fuse/file.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2710,6 +2710,9 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
27102710
inode = file->f_mapping->host;
27112711
i_size = i_size_read(inode);
27122712

2713+
if ((rw == READ) && (offset > i_size))
2714+
return 0;
2715+
27132716
/* optimization for short read */
27142717
if (async_dio && rw != WRITE && offset + count > i_size) {
27152718
if (offset >= i_size)

mm/filemap.c

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1428,30 +1428,28 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
14281428
if (!count)
14291429
goto out; /* skip atime */
14301430
size = i_size_read(inode);
1431-
if (pos < size) {
1432-
retval = filemap_write_and_wait_range(mapping, pos,
1431+
retval = filemap_write_and_wait_range(mapping, pos,
14331432
pos + iov_length(iov, nr_segs) - 1);
1434-
if (!retval) {
1435-
retval = mapping->a_ops->direct_IO(READ, iocb,
1436-
iov, pos, nr_segs);
1437-
}
1438-
if (retval > 0) {
1439-
*ppos = pos + retval;
1440-
count -= retval;
1441-
}
1433+
if (!retval) {
1434+
retval = mapping->a_ops->direct_IO(READ, iocb,
1435+
iov, pos, nr_segs);
1436+
}
1437+
if (retval > 0) {
1438+
*ppos = pos + retval;
1439+
count -= retval;
1440+
}
14421441

1443-
/*
1444-
* Btrfs can have a short DIO read if we encounter
1445-
* compressed extents, so if there was an error, or if
1446-
* we've already read everything we wanted to, or if
1447-
* there was a short read because we hit EOF, go ahead
1448-
* and return. Otherwise fallthrough to buffered io for
1449-
* the rest of the read.
1450-
*/
1451-
if (retval < 0 || !count || *ppos >= size) {
1452-
file_accessed(filp);
1453-
goto out;
1454-
}
1442+
/*
1443+
* Btrfs can have a short DIO read if we encounter
1444+
* compressed extents, so if there was an error, or if
1445+
* we've already read everything we wanted to, or if
1446+
* there was a short read because we hit EOF, go ahead
1447+
* and return. Otherwise fallthrough to buffered io for
1448+
* the rest of the read.
1449+
*/
1450+
if (retval < 0 || !count || *ppos >= size) {
1451+
file_accessed(filp);
1452+
goto out;
14551453
}
14561454
}
14571455

0 commit comments

Comments
 (0)