Skip to content

[ETCM-377] Fix consistency issue between cache and persisted block number on rollbacks #800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 18, 2020

Conversation

ntallar
Copy link

@ntallar ntallar commented Nov 17, 2020

Description

Fixes None.get problems on testnet when accessing the best block, this were caused by having our block number higher that the block we had

Proposed Solution

  • Refactor removeBlock method for easier readability
  • Treat best blocks cache as the most up-to-date values
  • When saving block, move persistance of best block number only after the in-memory block number was possibly updated
  • When removing a block, update the persisted best blocks if they would be referring to blocks we no longer have

I finally decided not to remove the caching of mpt nodes from our archive state storage to not over extend this PR, we can include that later

Testing

Added unit tests should pass

@ntallar ntallar added bug Something isn't working high priority This PRs should be reviewed and merged ASAP labels Nov 17, 2020
@ntallar ntallar force-pushed the etcm-377-inconsistent-best-block-number branch from d999dc0 to f6e7cb0 Compare November 17, 2020 12:56

// Checkpoint number updates are only done if the persisted value is larger, as we won't have the associated mpt nodes if not
val latestCheckpointNumberUpdates =
if (appStateStorage.getLatestCheckpointBlockNumber() > newLatestCheckpointNumber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I am not sure why we need to update this block in storage when we update it when we update it here:

    if (withState)
      stateStorage.onBlockRollback(block.number, bestBlockNumber) { () => persistBestBlocksData() }

when cache is full or requires flushing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary for consistency's sake

But I wanted to follow the same approach than here, it's also best to keep this value as up-to-date as possible.

Though we have some conditions before to update this values, it could have not been the case if they were false

.and(blockNumberMappingUpdates)
.and(bestBlockNumberUpdates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are fixing best block number handling when rollbackin, do you think this ugly thing is still necessary in BlockchainImpl:

  //FIXME EC-495 this method should not be need when best block is handled properly during rollback
  def persistCachedNodes(): Unit = {
    if (stateStorage.forcePersist(RollBackFlush)) {
      persistBestBlocksData()
    }
  }

We always call it before rollbacks happen (it is ugly hack from the past, as there was some troubles with cached block number when rollbacking earlier)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it doesn't seem to be necessary and I'd gladly remove it... though I'm a bit unsure just in case I add further possibilities of introduced issues by changing that

Though if you agree with removing it (or other reviewers do) I'll remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, if we can remove it now, let's do it! =)

maybeBlock match {
case Some(block) => removeBlock(block, withState)
case None =>
log.warn(s"Attempted removing block with hash $blockHash that we don't have")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${ByteStringUtils.hash2string(blockHash)} ?

Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for code it lgtm, did you try to run it on mordor/mainnet or some local privnet wheere you can freerly kill peers ?

@ntallar
Copy link
Author

ntallar commented Nov 18, 2020

As for code it lgtm, did you try to run it on mordor/mainnet or some local privnet wheere you can freerly kill peers ?

Given our time constraints I only synced some blocks from mordor (with regular sync) to some minimal checks, I'll re-deploy the nomad testnet after this PR gets merged to check how it works there

Copy link
Contributor

@mirkoAlic mirkoAlic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh testing in prod :D let it be then!

@ntallar ntallar merged commit dcb6d7b into develop Nov 18, 2020
@ntallar ntallar deleted the etcm-377-inconsistent-best-block-number branch November 18, 2020 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority This PRs should be reviewed and merged ASAP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants