Skip to content

Commit b88fe2b

Browse files
committed
Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker: "New Features: - Enable using direct IO with localio - Added localio related tracepoints Bugfixes: - Sunrpc fixes for working with a very large cl_tasks list - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client() - Fixes for handling reconnections with localio - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT - Fix COPY_NOTIFY xdr_buf size calculations - pNFS/Flexfiles fix for retrying requesting a layout segment for reads - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired Cleanups: - Various other nfs & nfsd localio cleanups - Prepratory patches for async copy improvements that are under development - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts - Add netns inum and srcaddr to debugfs rpc_xprt info" * tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits) SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info pnfs/flexfiles: retry getting layout segment for reads NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE NFSv4.2: fix COPY_NOTIFY xdr buf size calculation NFS: Rename struct nfs4_offloadcancel_data NFS: Fix typo in OFFLOAD_CANCEL comment NFS: CB_OFFLOAD can return NFS4ERR_DELAY nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it nfs: fix incorrect error handling in LOCALIO nfs: probe for LOCALIO when v3 client reconnects to server nfs: probe for LOCALIO when v4 client reconnects to server nfs/localio: remove redundant code and simplify LOCALIO enablement nfs_common: add nfs_localio trace events nfs_common: track all open nfsd_files per LOCALIO nfs_client nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ nfsd: update percpu_ref to manage references on nfsd_net ...
2 parents 3673f5b + 6f56971 commit b88fe2b

36 files changed

+838
-317
lines changed

Documentation/filesystems/nfs/localio.rst

Lines changed: 52 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -218,64 +218,30 @@ NFS Client and Server Interlock
218218
===============================
219219

220220
LOCALIO provides the nfs_uuid_t object and associated interfaces to
221-
allow proper network namespace (net-ns) and NFSD object refcounting:
222-
223-
We don't want to keep a long-term counted reference on each NFSD's
224-
net-ns in the client because that prevents a server container from
225-
completely shutting down.
226-
227-
So we avoid taking a reference at all and rely on the per-cpu
228-
reference to the server (detailed below) being sufficient to keep
229-
the net-ns active. This involves allowing the NFSD's net-ns exit
230-
code to iterate all active clients and clear their ->net pointers
231-
(which are needed to find the per-cpu-refcount for the nfsd_serv).
232-
233-
Details:
234-
235-
- Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head
236-
that can be used to find the client. It does add the 16-byte
237-
uuid_t to nfs_client so it is bigger than needed (given that
238-
uuid_t is only used during the initial NFS client and server
239-
LOCALIO handshake to determine if they are local to each other).
240-
If that is really a problem we can find a fix.
241-
242-
- When the nfs server confirms that the uuid_t is local, it moves
243-
the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net.
244-
245-
- When each server's net-ns is shutting down - in a "pre_exit"
246-
handler, all these nfs_uuid_t have their ->net cleared. There is
247-
an rcu_synchronize() call between pre_exit() handlers and exit()
248-
handlers so any caller that sees nfs_uuid_t ->net as not NULL can
249-
safely manage the per-cpu-refcount for nfsd_serv.
250-
251-
- The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it
252-
can safely dereference ->net in a private rcu_read_lock() section
253-
to allow safe access to the associated nfsd_net and nfsd_serv.
254-
255-
So LOCALIO required the introduction and use of NFSD's percpu_ref to
256-
interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each
257-
nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and
221+
allow proper network namespace (net-ns) and NFSD object refcounting.
222+
223+
LOCALIO required the introduction and use of NFSD's percpu nfsd_net_ref
224+
to interlock nfsd_shutdown_net() and nfsd_open_local_fh(), to ensure
225+
each net-ns is not destroyed while in use by nfsd_open_local_fh(), and
258226
warrants a more detailed explanation:
259227

260-
nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its
228+
nfsd_open_local_fh() uses nfsd_net_try_get() before opening its
261229
nfsd_file handle and then the caller (NFS client) must drop the
262-
reference for the nfsd_file and associated nn->nfsd_serv using
263-
nfs_file_put_local() once it has completed its IO.
230+
reference for the nfsd_file and associated net-ns using
231+
nfsd_file_put_local() once it has completed its IO.
264232

265233
This interlock working relies heavily on nfsd_open_local_fh() being
266234
afforded the ability to safely deal with the possibility that the
267235
NFSD's net-ns (and nfsd_net by association) may have been destroyed
268-
by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only
269-
possible given the nfs_uuid_t ->net pointer managemenet detailed
270-
above.
271-
272-
All told, this elaborate interlock of the NFS client and server has been
273-
verified to fix an easy to hit crash that would occur if an NFSD
274-
instance running in a container, with a LOCALIO client mounted, is
275-
shutdown. Upon restart of the container and associated NFSD the client
276-
would go on to crash due to NULL pointer dereference that occurred due
277-
to the LOCALIO client's attempting to nfsd_open_local_fh(), using
278-
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
236+
by nfsd_destroy_serv() via nfsd_shutdown_net().
237+
238+
This interlock of the NFS client and server has been verified to fix an
239+
easy to hit crash that would occur if an NFSD instance running in a
240+
container, with a LOCALIO client mounted, is shutdown. Upon restart of
241+
the container and associated NFSD, the client would go on to crash due
242+
to NULL pointer dereference that occurred due to the LOCALIO client's
243+
attempting to nfsd_open_local_fh() without having a proper reference on
244+
NFSD's net-ns.
279245

280246
NFS Client issues IO instead of Server
281247
======================================
@@ -306,10 +272,26 @@ is issuing IO to the underlying local filesystem that it is sharing with
306272
the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and
307273
fs/nfs/localio.c:nfs_local_commit().
308274

275+
With normal NFS that makes use of RPC to issue IO to the server, if an
276+
application uses O_DIRECT the NFS client will bypass the pagecache but
277+
the NFS server will not. The NFS server's use of buffered IO affords
278+
applications to be less precise with their alignment when issuing IO to
279+
the NFS client. But if all applications properly align their IO, LOCALIO
280+
can be configured to use end-to-end O_DIRECT semantics from the NFS
281+
client to the underlying local filesystem, that it is sharing with
282+
the NFS server, by setting the 'localio_O_DIRECT_semantics' nfs module
283+
parameter to Y, e.g.:
284+
285+
echo Y > /sys/module/nfs/parameters/localio_O_DIRECT_semantics
286+
287+
Once enabled, it will cause LOCALIO to use end-to-end O_DIRECT semantics
288+
(but again, this may cause IO to fail if applications do not properly
289+
align their IO).
290+
309291
Security
310292
========
311293

312-
Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
294+
LOCALIO is only supported when UNIX-style authentication (AUTH_UNIX, aka
313295
AUTH_SYS) is used.
314296

315297
Care is taken to ensure the same NFS security mechanisms are used
@@ -324,6 +306,24 @@ client is afforded this same level of access (albeit in terms of the NFS
324306
protocol via SUNRPC). No other namespaces (user, mount, etc) have been
325307
altered or purposely extended from the server to the client.
326308

309+
Module Parameters
310+
=================
311+
312+
/sys/module/nfs/parameters/localio_enabled (bool)
313+
controls if LOCALIO is enabled, defaults to Y. If client and server are
314+
local but 'localio_enabled' is set to N then LOCALIO will not be used.
315+
316+
/sys/module/nfs/parameters/localio_O_DIRECT_semantics (bool)
317+
controls if O_DIRECT extends down to the underlying filesystem, defaults
318+
to N. Application IO must be logical blocksize aligned, otherwise
319+
O_DIRECT will fail.
320+
321+
/sys/module/nfsv3/parameters/nfs3_localio_probe_throttle (uint)
322+
controls if NFSv3 read and write IOs will trigger (re)enabling of
323+
LOCALIO every N (nfs3_localio_probe_throttle) IOs, defaults to 0
324+
(disabled). Must be power-of-2, admin keeps all the pieces if they
325+
misconfigure (too low a value or non-power-of-2).
326+
327327
Testing
328328
=======
329329

fs/nfs/Kconfig

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,8 @@ config ROOT_NFS
170170

171171
config NFS_FSCACHE
172172
bool "Provide NFS client caching support"
173-
depends on NFS_FS=m && NETFS_SUPPORT || NFS_FS=y && NETFS_SUPPORT=y
173+
depends on NFS_FS
174+
select NETFS_SUPPORT
174175
select FSCACHE
175176
help
176177
Say Y here if you want NFS data to be cached locally on disc through

fs/nfs/callback_proc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -718,7 +718,7 @@ __be32 nfs4_callback_offload(void *data, void *dummy,
718718

719719
copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL);
720720
if (!copy)
721-
return htonl(NFS4ERR_SERVERFAULT);
721+
return cpu_to_be32(NFS4ERR_DELAY);
722722

723723
spin_lock(&cps->clp->cl_lock);
724724
rcu_read_lock();

fs/nfs/client.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
#include <linux/sunrpc/bc_xprt.h>
3939
#include <linux/nsproxy.h>
4040
#include <linux/pid_namespace.h>
41-
41+
#include <linux/nfslocalio.h>
4242

4343
#include "nfs4_fs.h"
4444
#include "callback.h"
@@ -186,7 +186,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
186186
seqlock_init(&clp->cl_boot_lock);
187187
ktime_get_real_ts64(&clp->cl_nfssvc_boot);
188188
nfs_uuid_init(&clp->cl_uuid);
189-
spin_lock_init(&clp->cl_localio_lock);
189+
INIT_WORK(&clp->cl_local_probe_work, nfs_local_probe_async_work);
190190
#endif /* CONFIG_NFS_LOCALIO */
191191

192192
clp->cl_principal = "*";
@@ -244,7 +244,7 @@ static void pnfs_init_server(struct nfs_server *server)
244244
*/
245245
void nfs_free_client(struct nfs_client *clp)
246246
{
247-
nfs_local_disable(clp);
247+
nfs_localio_disable_client(clp);
248248

249249
/* -EIO all pending I/O */
250250
if (!IS_ERR(clp->cl_rpcclient))

fs/nfs/direct.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error)
303303
static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
304304
{
305305
get_dreq(hdr->dreq);
306+
set_bit(NFS_IOHDR_ODIRECT, &hdr->flags);
306307
}
307308

308309
static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {

fs/nfs/flexfilelayout/flexfilelayout.c

Lines changed: 34 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -164,18 +164,17 @@ decode_name(struct xdr_stream *xdr, u32 *id)
164164
}
165165

166166
static struct nfsd_file *
167-
ff_local_open_fh(struct nfs_client *clp, const struct cred *cred,
167+
ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
168+
struct nfs_client *clp, const struct cred *cred,
168169
struct nfs_fh *fh, fmode_t mode)
169170
{
170-
if (mode & FMODE_WRITE) {
171-
/*
172-
* Always request read and write access since this corresponds
173-
* to a rw layout.
174-
*/
175-
mode |= FMODE_READ;
176-
}
171+
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
172+
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
177173

178-
return nfs_local_open_fh(clp, cred, fh, mode);
174+
return nfs_local_open_fh(clp, cred, fh, &mirror->nfl, mode);
175+
#else
176+
return NULL;
177+
#endif
179178
}
180179

181180
static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
@@ -247,6 +246,7 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
247246
spin_lock_init(&mirror->lock);
248247
refcount_set(&mirror->ref, 1);
249248
INIT_LIST_HEAD(&mirror->mirrors);
249+
nfs_localio_file_init(&mirror->nfl);
250250
}
251251
return mirror;
252252
}
@@ -257,6 +257,7 @@ static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
257257

258258
ff_layout_remove_mirror(mirror);
259259
kfree(mirror->fh_versions);
260+
nfs_close_local_fh(&mirror->nfl);
260261
cred = rcu_access_pointer(mirror->ro_cred);
261262
put_cred(cred);
262263
cred = rcu_access_pointer(mirror->rw_cred);
@@ -847,6 +848,9 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
847848
struct nfs4_pnfs_ds *ds;
848849
u32 ds_idx;
849850

851+
if (NFS_SERVER(pgio->pg_inode)->flags &
852+
(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR))
853+
pgio->pg_maxretrans = io_maxretrans;
850854
retry:
851855
pnfs_generic_pg_check_layout(pgio, req);
852856
/* Use full layout for now */
@@ -860,6 +864,8 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
860864
if (!pgio->pg_lseg)
861865
goto out_nolseg;
862866
}
867+
/* Reset wb_nio, since getting layout segment was successful */
868+
req->wb_nio = 0;
863869

864870
ds = ff_layout_get_ds_for_read(pgio, &ds_idx);
865871
if (!ds) {
@@ -876,14 +882,24 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
876882
pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
877883

878884
pgio->pg_mirror_idx = ds_idx;
879-
880-
if (NFS_SERVER(pgio->pg_inode)->flags &
881-
(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR))
882-
pgio->pg_maxretrans = io_maxretrans;
883885
return;
884886
out_nolseg:
885-
if (pgio->pg_error < 0)
886-
return;
887+
if (pgio->pg_error < 0) {
888+
if (pgio->pg_error != -EAGAIN)
889+
return;
890+
/* Retry getting layout segment if lower layer returned -EAGAIN */
891+
if (pgio->pg_maxretrans && req->wb_nio++ > pgio->pg_maxretrans) {
892+
if (NFS_SERVER(pgio->pg_inode)->flags & NFS_MOUNT_SOFTERR)
893+
pgio->pg_error = -ETIMEDOUT;
894+
else
895+
pgio->pg_error = -EIO;
896+
return;
897+
}
898+
pgio->pg_error = 0;
899+
/* Sleep for 1 second before retrying */
900+
ssleep(1);
901+
goto retry;
902+
}
887903
out_mds:
888904
trace_pnfs_mds_fallback_pg_init_read(pgio->pg_inode,
889905
0, NFS4_MAX_UINT64, IOMODE_READ,
@@ -1820,7 +1836,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
18201836
hdr->mds_offset = offset;
18211837

18221838
/* Start IO accounting for local read */
1823-
localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ);
1839+
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
18241840
if (localio) {
18251841
hdr->task.tk_start = ktime_get();
18261842
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
@@ -1896,7 +1912,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
18961912
hdr->args.offset = offset;
18971913

18981914
/* Start IO accounting for local write */
1899-
localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
1915+
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
19001916
FMODE_READ|FMODE_WRITE);
19011917
if (localio) {
19021918
hdr->task.tk_start = ktime_get();
@@ -1981,7 +1997,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
19811997
data->args.fh = fh;
19821998

19831999
/* Start IO accounting for local commit */
1984-
localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
2000+
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
19852001
FMODE_READ|FMODE_WRITE);
19862002
if (localio) {
19872003
data->task.tk_start = ktime_get();

fs/nfs/flexfilelayout/flexfilelayout.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ struct nfs4_ff_layout_mirror {
8383
nfs4_stateid stateid;
8484
const struct cred __rcu *ro_cred;
8585
const struct cred __rcu *rw_cred;
86+
struct nfs_file_localio nfl;
8687
refcount_t ref;
8788
spinlock_t lock;
8889
unsigned long flags;

fs/nfs/inode.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1137,6 +1137,8 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
11371137
ctx->lock_context.open_context = ctx;
11381138
INIT_LIST_HEAD(&ctx->list);
11391139
ctx->mdsthreshold = NULL;
1140+
nfs_localio_file_init(&ctx->nfl);
1141+
11401142
return ctx;
11411143
}
11421144
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
@@ -1168,6 +1170,7 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
11681170
nfs_sb_deactive(sb);
11691171
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
11701172
kfree(ctx->mdsthreshold);
1173+
nfs_close_local_fh(&ctx->nfl);
11711174
kfree_rcu(ctx, rcu_head);
11721175
}
11731176

fs/nfs/internal.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -455,11 +455,13 @@ extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
455455

456456
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
457457
/* localio.c */
458-
extern void nfs_local_disable(struct nfs_client *);
459458
extern void nfs_local_probe(struct nfs_client *);
459+
extern void nfs_local_probe_async(struct nfs_client *);
460+
extern void nfs_local_probe_async_work(struct work_struct *);
460461
extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *,
461462
const struct cred *,
462463
struct nfs_fh *,
464+
struct nfs_file_localio *,
463465
const fmode_t);
464466
extern int nfs_local_doio(struct nfs_client *,
465467
struct nfsd_file *,
@@ -471,11 +473,12 @@ extern int nfs_local_commit(struct nfsd_file *,
471473
extern bool nfs_server_is_local(const struct nfs_client *clp);
472474

473475
#else /* CONFIG_NFS_LOCALIO */
474-
static inline void nfs_local_disable(struct nfs_client *clp) {}
475476
static inline void nfs_local_probe(struct nfs_client *clp) {}
477+
static inline void nfs_local_probe_async(struct nfs_client *clp) {}
476478
static inline struct nfsd_file *
477479
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
478-
struct nfs_fh *fh, const fmode_t mode)
480+
struct nfs_fh *fh, struct nfs_file_localio *nfl,
481+
const fmode_t mode)
479482
{
480483
return NULL;
481484
}

0 commit comments

Comments
 (0)