Skip to content

Commit 288ace2

Browse files
committed
netfs: New writeback implementation
The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected]
1 parent 7ba167c commit 288ace2

File tree

8 files changed

+1829
-9
lines changed

8 files changed

+1829
-9
lines changed

fs/netfs/Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ netfs-y := \
1111
main.o \
1212
misc.o \
1313
objects.o \
14-
output.o
14+
output.o \
15+
write_collect.o \
16+
write_issue.o
1517

1618
netfs-$(CONFIG_NETFS_STATS) += stats.o
1719

fs/netfs/buffered_write.c

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,16 +74,12 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct netfs_inode *ctx,
7474

7575
if (file->f_mode & FMODE_READ)
7676
goto no_write_streaming;
77-
if (test_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags))
78-
goto no_write_streaming;
7977

8078
if (netfs_is_cache_enabled(ctx)) {
8179
/* We don't want to get a streaming write on a file that loses
8280
* caching service temporarily because the backing store got
8381
* culled.
8482
*/
85-
if (!test_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags))
86-
set_bit(NETFS_ICTX_NO_WRITE_STREAMING, &ctx->flags);
8783
goto no_write_streaming;
8884
}
8985

fs/netfs/internal.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,33 @@ static inline void netfs_stat_d(atomic_t *stat)
150150
#define netfs_stat_d(x) do {} while(0)
151151
#endif
152152

153+
/*
154+
* write_collect.c
155+
*/
156+
int netfs_folio_written_back(struct folio *folio);
157+
void netfs_write_collection_worker(struct work_struct *work);
158+
void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async);
159+
160+
/*
161+
* write_issue.c
162+
*/
163+
struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
164+
struct file *file,
165+
loff_t start,
166+
enum netfs_io_origin origin);
167+
void netfs_reissue_write(struct netfs_io_stream *stream,
168+
struct netfs_io_subrequest *subreq);
169+
int netfs_advance_write(struct netfs_io_request *wreq,
170+
struct netfs_io_stream *stream,
171+
loff_t start, size_t len, bool to_eof);
172+
struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t len);
173+
int new_netfs_advance_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
174+
struct folio *folio, size_t copied, bool to_page_end,
175+
struct folio **writethrough_cache);
176+
int new_netfs_end_writethrough(struct netfs_io_request *wreq, struct writeback_control *wbc,
177+
struct folio *writethrough_cache);
178+
int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t len);
179+
153180
/*
154181
* Miscellaneous functions.
155182
*/

fs/netfs/objects.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
4747
rreq->inode = inode;
4848
rreq->i_size = i_size_read(inode);
4949
rreq->debug_id = atomic_inc_return(&debug_ids);
50+
rreq->wsize = INT_MAX;
51+
spin_lock_init(&rreq->lock);
52+
INIT_LIST_HEAD(&rreq->io_streams[0].subrequests);
53+
INIT_LIST_HEAD(&rreq->io_streams[1].subrequests);
5054
INIT_LIST_HEAD(&rreq->subrequests);
5155
INIT_WORK(&rreq->work, NULL);
5256
refcount_set(&rreq->ref, 1);
@@ -85,6 +89,8 @@ void netfs_get_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace
8589
void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async)
8690
{
8791
struct netfs_io_subrequest *subreq;
92+
struct netfs_io_stream *stream;
93+
int s;
8894

8995
while (!list_empty(&rreq->subrequests)) {
9096
subreq = list_first_entry(&rreq->subrequests,
@@ -93,6 +99,17 @@ void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async)
9399
netfs_put_subrequest(subreq, was_async,
94100
netfs_sreq_trace_put_clear);
95101
}
102+
103+
for (s = 0; s < ARRAY_SIZE(rreq->io_streams); s++) {
104+
stream = &rreq->io_streams[s];
105+
while (!list_empty(&stream->subrequests)) {
106+
subreq = list_first_entry(&stream->subrequests,
107+
struct netfs_io_subrequest, rreq_link);
108+
list_del(&subreq->rreq_link);
109+
netfs_put_subrequest(subreq, was_async,
110+
netfs_sreq_trace_put_clear);
111+
}
112+
}
96113
}
97114

98115
static void netfs_free_request_rcu(struct rcu_head *rcu)

0 commit comments

Comments
 (0)