-
Notifications
You must be signed in to change notification settings - Fork 75
[ETCM-377] Fix consistency issue between cache and persisted block number on rollbacks #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mber on rollbacks
d999dc0
to
f6e7cb0
Compare
|
||
// Checkpoint number updates are only done if the persisted value is larger, as we won't have the associated mpt nodes if not | ||
val latestCheckpointNumberUpdates = | ||
if (appStateStorage.getLatestCheckpointBlockNumber() > newLatestCheckpointNumber) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I am not sure why we need to update this block in storage when we update it when we update it here:
if (withState)
stateStorage.onBlockRollback(block.number, bestBlockNumber) { () => persistBestBlocksData() }
when cache is full or requires flushing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary for consistency's sake
But I wanted to follow the same approach than here, it's also best to keep this value as up-to-date as possible.
Though we have some conditions before to update this values, it could have not been the case if they were false
.and(blockNumberMappingUpdates) | ||
.and(bestBlockNumberUpdates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are fixing best block number handling when rollbackin, do you think this ugly thing is still necessary in BlockchainImpl
:
//FIXME EC-495 this method should not be need when best block is handled properly during rollback
def persistCachedNodes(): Unit = {
if (stateStorage.forcePersist(RollBackFlush)) {
persistBestBlocksData()
}
}
We always call it before rollbacks happen (it is ugly hack from the past, as there was some troubles with cached block number when rollbacking earlier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it doesn't seem to be necessary and I'd gladly remove it... though I'm a bit unsure just in case I add further possibilities of introduced issues by changing that
Though if you agree with removing it (or other reviewers do) I'll remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, if we can remove it now, let's do it! =)
maybeBlock match { | ||
case Some(block) => removeBlock(block, withState) | ||
case None => | ||
log.warn(s"Attempted removing block with hash $blockHash that we don't have") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${ByteStringUtils.hash2string(blockHash)}
?
…-377-inconsistent-best-block-number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for code it lgtm, did you try to run it on mordor/mainnet or some local privnet wheere you can freerly kill peers ?
Given our time constraints I only synced some blocks from mordor (with regular sync) to some minimal checks, I'll re-deploy the nomad testnet after this PR gets merged to check how it works there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh testing in prod :D let it be then!
Description
Fixes None.get problems on testnet when accessing the best block, this were caused by having our block number higher that the block we had
Proposed Solution
I finally decided not to remove the caching of mpt nodes from our archive state storage to not over extend this PR, we can include that later
Testing
Added unit tests should pass