Skip to content

Commit fdaf1bb

Browse files
Dave Chinnerdchinner
authored andcommitted
xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
We can't use the same algorithm for replacing an existing attribute when logging attributes. The existing algorithm is essentially: 1. create new attr w/ INCOMPLETE 2. atomically flip INCOMPLETE flags between old + new attribute 3. remove old attr which is marked w/ INCOMPLETE This algorithm guarantees that we see either the old or new attribute, and if we fail after the atomic flag flip, we don't have to recover the removal of the old attr because we never see INCOMPLETE attributes in lookups. For logged attributes, however, this does not work. The logged attribute intents do not track the work that has been done as the transaction rolls, and hence the only recovery mechanism we have is "run the replace operation from scratch". This is further exacerbated by the attempt to avoid needing the INCOMPLETE flag to create an atomic swap. This means we can create a second active attribute of the same name before we remove the original. If we fail at any point after the create but before the removal has completed, we end up with duplicate attributes in the attr btree and recovery only tries to replace one of them. There are several other failure modes where we can leave partially allocated remote attributes that expose stale data, partially free remote attributes that enable UAF based stale data exposure, etc. TO fix this, we need a different algorithm for replace operations when LARP is enabled. Luckily, it's not that complex if we take the right first step. That is, the first thing we log is the attri intent with the new name/value pair and mark the old attr as INCOMPLETE in the same transaction. From there, we then remove the old attr and keep relogging the new name/value in the intent, such that we always know that we have to create the new attr in recovery. Once the old attr is removed, we then run a normal ATTR_CREATE operation relogging the intent as we go. If the new attr is local, then it gets created in a single atomic transaction that also logs the final intent done. If the new attr is remote, the we set INCOMPLETE on the new attr while we allocate and set the remote value, and then we clear the INCOMPLETE flag at in the last transaction taht logs the final intent done. If we fail at any point in this algorithm, log recovery will always see the same state on disk: the new name/value in the intent, and either an INCOMPLETE attr or no attr in the attr btree. If we find an INCOMPLETE attr, we run the full replace starting with removing the INCOMPLETE attr. If we don't find it, then we simply create the new attr. Notably, recovery of a failed create that has an INCOMPLETE flag set is now the same - we start with the lookup of the INCOMPLETE attr, and if that exists then we do the full replace recovery process, otherwise we just create the new attr. Hence changing the way we do the replace operation when LARP is enabled allows us to use the same log recovery algorithm for both the ATTR_CREATE and ATTR_REPLACE operations. This is also the same algorithm we use for runtime ATTR_REPLACE operations (except for the step setting up the initial conditions). The result is that: - ATTR_CREATE uses the same algorithm regardless of whether LARP is enabled or not - ATTR_REPLACE with larp=0 is identical to the old algorithm - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm from the larp=0 code and then runs the unmodified ATTR_CREATE code. - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as it uses at runtime. Because the state machine is now quite clean, changing the algorithm is really just a case of changing the initial state and how the states link together for the ATTR_REPLACE case. Hence it's not a huge amount of code for what is a fairly substantial rework of the attr logging and recovery algorithm.... Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
1 parent e7f358d commit fdaf1bb

File tree

6 files changed

+137
-72
lines changed

6 files changed

+137
-72
lines changed

fs/xfs/libxfs/xfs_attr.c

Lines changed: 58 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,12 @@ int
6868
xfs_inode_hasattr(
6969
struct xfs_inode *ip)
7070
{
71-
if (!XFS_IFORK_Q(ip) ||
72-
(ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
73-
ip->i_afp->if_nextents == 0))
71+
if (!XFS_IFORK_Q(ip))
72+
return 0;
73+
if (!ip->i_afp)
74+
return 0;
75+
if (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
76+
ip->i_afp->if_nextents == 0)
7477
return 0;
7578
return 1;
7679
}
@@ -408,23 +411,30 @@ xfs_attr_sf_addname(
408411
}
409412

410413
/*
411-
* When we bump the state to REPLACE, we may actually need to skip over the
412-
* state. When LARP mode is enabled, we don't need to run the atomic flags flip,
413-
* so we skip straight over the REPLACE state and go on to REMOVE_OLD.
414+
* Handle the state change on completion of a multi-state attr operation.
415+
*
416+
* If the XFS_DA_OP_REPLACE flag is set, this means the operation was the first
417+
* modification in a attr replace operation and we still have to do the second
418+
* state, indicated by @replace_state.
419+
*
420+
* We consume the XFS_DA_OP_REPLACE flag so that when we are called again on
421+
* completion of the second half of the attr replace operation we correctly
422+
* signal that it is done.
414423
*/
415-
static void
416-
xfs_attr_dela_state_set_replace(
424+
static enum xfs_delattr_state
425+
xfs_attr_complete_op(
417426
struct xfs_attr_item *attr,
418-
enum xfs_delattr_state replace)
427+
enum xfs_delattr_state replace_state)
419428
{
420429
struct xfs_da_args *args = attr->xattri_da_args;
430+
bool do_replace = args->op_flags & XFS_DA_OP_REPLACE;
421431

422-
ASSERT(replace == XFS_DAS_LEAF_REPLACE ||
423-
replace == XFS_DAS_NODE_REPLACE);
424-
425-
attr->xattri_dela_state = replace;
426-
if (xfs_has_larp(args->dp->i_mount))
427-
attr->xattri_dela_state++;
432+
args->op_flags &= ~XFS_DA_OP_REPLACE;
433+
if (do_replace) {
434+
args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
435+
return replace_state;
436+
}
437+
return XFS_DAS_DONE;
428438
}
429439

430440
static int
@@ -466,10 +476,9 @@ xfs_attr_leaf_addname(
466476
*/
467477
if (args->rmtblkno)
468478
attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
469-
else if (args->op_flags & XFS_DA_OP_REPLACE)
470-
xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
471479
else
472-
attr->xattri_dela_state = XFS_DAS_DONE;
480+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
481+
XFS_DAS_LEAF_REPLACE);
473482
out:
474483
trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
475484
return error;
@@ -511,10 +520,9 @@ xfs_attr_node_addname(
511520

512521
if (args->rmtblkno)
513522
attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
514-
else if (args->op_flags & XFS_DA_OP_REPLACE)
515-
xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
516523
else
517-
attr->xattri_dela_state = XFS_DAS_DONE;
524+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
525+
XFS_DAS_NODE_REPLACE);
518526
out:
519527
trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp);
520528
return error;
@@ -546,18 +554,15 @@ xfs_attr_rmtval_alloc(
546554
if (error)
547555
return error;
548556

549-
/* If this is not a rename, clear the incomplete flag and we're done. */
550-
if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
557+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
558+
++attr->xattri_dela_state);
559+
/*
560+
* If we are not doing a rename, we've finished the operation but still
561+
* have to clear the incomplete flag protecting the new attr from
562+
* exposing partially initialised state if we crash during creation.
563+
*/
564+
if (attr->xattri_dela_state == XFS_DAS_DONE)
551565
error = xfs_attr3_leaf_clearflag(args);
552-
attr->xattri_dela_state = XFS_DAS_DONE;
553-
} else {
554-
/*
555-
* We are running a REPLACE operation, so we need to bump the
556-
* state to the step in that operation.
557-
*/
558-
attr->xattri_dela_state++;
559-
xfs_attr_dela_state_set_replace(attr, attr->xattri_dela_state);
560-
}
561566
out:
562567
trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
563568
return error;
@@ -714,13 +719,24 @@ xfs_attr_set_iter(
714719
return xfs_attr_node_addname(attr);
715720

716721
case XFS_DAS_SF_REMOVE:
717-
attr->xattri_dela_state = XFS_DAS_DONE;
718-
return xfs_attr_sf_removename(args);
722+
error = xfs_attr_sf_removename(args);
723+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
724+
xfs_attr_init_add_state(args));
725+
break;
719726
case XFS_DAS_LEAF_REMOVE:
720-
attr->xattri_dela_state = XFS_DAS_DONE;
721-
return xfs_attr_leaf_removename(args);
727+
error = xfs_attr_leaf_removename(args);
728+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
729+
xfs_attr_init_add_state(args));
730+
break;
722731
case XFS_DAS_NODE_REMOVE:
723732
error = xfs_attr_node_removename_setup(attr);
733+
if (error == -ENOATTR &&
734+
(args->op_flags & XFS_DA_OP_RECOVERY)) {
735+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
736+
xfs_attr_init_add_state(args));
737+
error = 0;
738+
break;
739+
}
724740
if (error)
725741
return error;
726742
attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
@@ -806,14 +822,16 @@ xfs_attr_set_iter(
806822

807823
case XFS_DAS_LEAF_REMOVE_ATTR:
808824
error = xfs_attr_leaf_remove_attr(attr);
809-
attr->xattri_dela_state = XFS_DAS_DONE;
825+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
826+
xfs_attr_init_add_state(args));
810827
break;
811828

812829
case XFS_DAS_NODE_REMOVE_ATTR:
813830
error = xfs_attr_node_remove_attr(attr);
814831
if (!error)
815832
error = xfs_attr_leaf_shrink(args);
816-
attr->xattri_dela_state = XFS_DAS_DONE;
833+
attr->xattri_dela_state = xfs_attr_complete_op(attr,
834+
xfs_attr_init_add_state(args));
817835
break;
818836
default:
819837
ASSERT(0);
@@ -1315,9 +1333,10 @@ xfs_attr_leaf_removename(
13151333
dp = args->dp;
13161334

13171335
error = xfs_attr_leaf_hasname(args, &bp);
1318-
13191336
if (error == -ENOATTR) {
13201337
xfs_trans_brelse(args->trans, bp);
1338+
if (args->op_flags & XFS_DA_OP_RECOVERY)
1339+
return 0;
13211340
return error;
13221341
} else if (error != -EEXIST)
13231342
return error;

fs/xfs/libxfs/xfs_attr.h

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -444,26 +444,31 @@ struct xfs_attr_list_context {
444444
*/
445445
enum xfs_delattr_state {
446446
XFS_DAS_UNINIT = 0, /* No state has been set yet */
447-
XFS_DAS_SF_ADD, /* Initial shortform set iter state */
448-
XFS_DAS_LEAF_ADD, /* Initial leaf form set iter state */
449-
XFS_DAS_NODE_ADD, /* Initial node form set iter state */
450-
XFS_DAS_RMTBLK, /* Removing remote blks */
451-
XFS_DAS_RM_NAME, /* Remove attr name */
452-
XFS_DAS_RM_SHRINK, /* We are shrinking the tree */
453-
454-
XFS_DAS_SF_REMOVE, /* Initial shortform set iter state */
455-
XFS_DAS_LEAF_REMOVE, /* Initial leaf form set iter state */
456-
XFS_DAS_NODE_REMOVE, /* Initial node form set iter state */
457-
458-
/* Leaf state set/replace sequence */
447+
448+
/*
449+
* Initial sequence states. The replace setup code relies on the
450+
* ADD and REMOVE states for a specific format to be sequential so
451+
* that we can transform the initial operation to be performed
452+
* according to the xfs_has_larp() state easily.
453+
*/
454+
XFS_DAS_SF_ADD, /* Initial sf add state */
455+
XFS_DAS_SF_REMOVE, /* Initial sf replace/remove state */
456+
457+
XFS_DAS_LEAF_ADD, /* Initial leaf add state */
458+
XFS_DAS_LEAF_REMOVE, /* Initial leaf replace/remove state */
459+
460+
XFS_DAS_NODE_ADD, /* Initial node add state */
461+
XFS_DAS_NODE_REMOVE, /* Initial node replace/remove state */
462+
463+
/* Leaf state set/replace/remove sequence */
459464
XFS_DAS_LEAF_SET_RMT, /* set a remote xattr from a leaf */
460465
XFS_DAS_LEAF_ALLOC_RMT, /* We are allocating remote blocks */
461466
XFS_DAS_LEAF_REPLACE, /* Perform replace ops on a leaf */
462467
XFS_DAS_LEAF_REMOVE_OLD, /* Start removing old attr from leaf */
463468
XFS_DAS_LEAF_REMOVE_RMT, /* A rename is removing remote blocks */
464469
XFS_DAS_LEAF_REMOVE_ATTR, /* Remove the old attr from a leaf */
465470

466-
/* Node state set/replace sequence, must match leaf state above */
471+
/* Node state sequence, must match leaf state above */
467472
XFS_DAS_NODE_SET_RMT, /* set a remote xattr from a node */
468473
XFS_DAS_NODE_ALLOC_RMT, /* We are allocating remote blocks */
469474
XFS_DAS_NODE_REPLACE, /* Perform replace ops on a node */
@@ -477,11 +482,11 @@ enum xfs_delattr_state {
477482
#define XFS_DAS_STRINGS \
478483
{ XFS_DAS_UNINIT, "XFS_DAS_UNINIT" }, \
479484
{ XFS_DAS_SF_ADD, "XFS_DAS_SF_ADD" }, \
485+
{ XFS_DAS_SF_REMOVE, "XFS_DAS_SF_REMOVE" }, \
480486
{ XFS_DAS_LEAF_ADD, "XFS_DAS_LEAF_ADD" }, \
487+
{ XFS_DAS_LEAF_REMOVE, "XFS_DAS_LEAF_REMOVE" }, \
481488
{ XFS_DAS_NODE_ADD, "XFS_DAS_NODE_ADD" }, \
482-
{ XFS_DAS_RMTBLK, "XFS_DAS_RMTBLK" }, \
483-
{ XFS_DAS_RM_NAME, "XFS_DAS_RM_NAME" }, \
484-
{ XFS_DAS_RM_SHRINK, "XFS_DAS_RM_SHRINK" }, \
489+
{ XFS_DAS_NODE_REMOVE, "XFS_DAS_NODE_REMOVE" }, \
485490
{ XFS_DAS_LEAF_SET_RMT, "XFS_DAS_LEAF_SET_RMT" }, \
486491
{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
487492
{ XFS_DAS_LEAF_REPLACE, "XFS_DAS_LEAF_REPLACE" }, \
@@ -525,8 +530,7 @@ struct xfs_attr_item {
525530
enum xfs_delattr_state xattri_dela_state;
526531

527532
/*
528-
* Indicates if the attr operation is a set or a remove
529-
* XFS_ATTR_OP_FLAGS_{SET,REMOVE}
533+
* Attr operation being performed - XFS_ATTR_OP_FLAGS_*
530534
*/
531535
unsigned int xattri_op_flags;
532536

@@ -614,10 +618,19 @@ xfs_attr_init_remove_state(struct xfs_da_args *args)
614618
return XFS_DAS_NODE_REMOVE;
615619
}
616620

621+
/*
622+
* If we are logging the attributes, then we have to start with removal of the
623+
* old attribute so that there is always consistent state that we can recover
624+
* from if the system goes down part way through. We always log the new attr
625+
* value, so even when we remove the attr first we still have the information in
626+
* the log to finish the replace operation atomically.
627+
*/
617628
static inline enum xfs_delattr_state
618629
xfs_attr_init_replace_state(struct xfs_da_args *args)
619630
{
620631
args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
632+
if (xfs_has_larp(args->dp->i_mount))
633+
return xfs_attr_init_remove_state(args);
621634
return xfs_attr_init_add_state(args);
622635
}
623636

fs/xfs/libxfs/xfs_attr_leaf.c

Lines changed: 36 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -446,21 +446,33 @@ xfs_attr3_leaf_read(
446446
* Namespace helper routines
447447
*========================================================================*/
448448

449+
/*
450+
* If we are in log recovery, then we want the lookup to ignore the INCOMPLETE
451+
* flag on disk - if there's an incomplete attr then recovery needs to tear it
452+
* down. If there's no incomplete attr, then recovery needs to tear that attr
453+
* down to replace it with the attr that has been logged. In this case, the
454+
* INCOMPLETE flag will not be set in attr->attr_filter, but rather
455+
* XFS_DA_OP_RECOVERY will be set in args->op_flags.
456+
*/
449457
static bool
450458
xfs_attr_match(
451459
struct xfs_da_args *args,
452460
uint8_t namelen,
453461
unsigned char *name,
454462
int flags)
455463
{
464+
456465
if (args->namelen != namelen)
457466
return false;
458467
if (memcmp(args->name, name, namelen) != 0)
459468
return false;
460-
/*
461-
* If we are looking for incomplete entries, show only those, else only
462-
* show complete entries.
463-
*/
469+
470+
/* Recovery ignores the INCOMPLETE flag. */
471+
if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
472+
args->attr_filter == (flags & XFS_ATTR_NSP_ONDISK_MASK))
473+
return true;
474+
475+
/* All remaining matches need to be filtered by INCOMPLETE state. */
464476
if (args->attr_filter !=
465477
(flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
466478
return false;
@@ -799,6 +811,14 @@ xfs_attr_sf_removename(
799811
sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data;
800812

801813
error = xfs_attr_sf_findname(args, &sfe, &base);
814+
815+
/*
816+
* If we are recovering an operation, finding nothing to
817+
* remove is not an error - it just means there was nothing
818+
* to clean up.
819+
*/
820+
if (error == -ENOATTR && (args->op_flags & XFS_DA_OP_RECOVERY))
821+
return 0;
802822
if (error != -EEXIST)
803823
return error;
804824
size = xfs_attr_sf_entsize(sfe);
@@ -819,7 +839,7 @@ xfs_attr_sf_removename(
819839
totsize -= size;
820840
if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp) &&
821841
(dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
822-
!(args->op_flags & XFS_DA_OP_ADDNAME)) {
842+
!(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
823843
xfs_attr_fork_remove(dp, args->trans);
824844
} else {
825845
xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
@@ -1128,9 +1148,17 @@ xfs_attr3_leaf_to_shortform(
11281148
goto out;
11291149

11301150
if (forkoff == -1) {
1131-
ASSERT(xfs_has_attr2(dp->i_mount));
1132-
ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
1133-
xfs_attr_fork_remove(dp, args->trans);
1151+
/*
1152+
* Don't remove the attr fork if this operation is the first
1153+
* part of a attr replace operations. We're going to add a new
1154+
* attr immediately, so we need to keep the attr fork around in
1155+
* this case.
1156+
*/
1157+
if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
1158+
ASSERT(xfs_has_attr2(dp->i_mount));
1159+
ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
1160+
xfs_attr_fork_remove(dp, args->trans);
1161+
}
11341162
goto out;
11351163
}
11361164

fs/xfs/libxfs/xfs_da_btree.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ typedef struct xfs_da_args {
9191
#define XFS_DA_OP_CILOOKUP (1u << 4) /* lookup returns CI name if found */
9292
#define XFS_DA_OP_NOTIME (1u << 5) /* don't update inode timestamps */
9393
#define XFS_DA_OP_REMOVE (1u << 6) /* this is a remove operation */
94+
#define XFS_DA_OP_RECOVERY (1u << 7) /* Log recovery operation */
9495

9596
#define XFS_DA_OP_FLAGS \
9697
{ XFS_DA_OP_JUSTCHECK, "JUSTCHECK" }, \
@@ -99,7 +100,8 @@ typedef struct xfs_da_args {
99100
{ XFS_DA_OP_OKNOENT, "OKNOENT" }, \
100101
{ XFS_DA_OP_CILOOKUP, "CILOOKUP" }, \
101102
{ XFS_DA_OP_NOTIME, "NOTIME" }, \
102-
{ XFS_DA_OP_REMOVE, "REMOVE" }
103+
{ XFS_DA_OP_REMOVE, "REMOVE" }, \
104+
{ XFS_DA_OP_RECOVERY, "RECOVERY" }
103105

104106
/*
105107
* Storage for holding state during Btree searches and split/join ops.

fs/xfs/xfs_attr_item.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -561,16 +561,22 @@ xfs_attri_item_recover(
561561
args->namelen = attrp->alfi_name_len;
562562
args->hashval = xfs_da_hashname(args->name, args->namelen);
563563
args->attr_filter = attrp->alfi_attr_flags;
564+
args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT;
564565

565566
switch (attrp->alfi_op_flags & XFS_ATTR_OP_FLAGS_TYPE_MASK) {
566567
case XFS_ATTR_OP_FLAGS_SET:
567568
case XFS_ATTR_OP_FLAGS_REPLACE:
568569
args->value = attrip->attri_value;
569570
args->valuelen = attrp->alfi_value_len;
570571
args->total = xfs_attr_calc_size(args, &local);
571-
attr->xattri_dela_state = xfs_attr_init_add_state(args);
572+
if (xfs_inode_hasattr(args->dp))
573+
attr->xattri_dela_state = xfs_attr_init_replace_state(args);
574+
else
575+
attr->xattri_dela_state = xfs_attr_init_add_state(args);
572576
break;
573577
case XFS_ATTR_OP_FLAGS_REMOVE:
578+
if (!xfs_inode_hasattr(args->dp))
579+
goto out;
574580
attr->xattri_dela_state = xfs_attr_init_remove_state(args);
575581
break;
576582
default:

0 commit comments

Comments
 (0)