[ETCM-102] Fast sync integration tests #672

KonradStaniec · 2020-09-16T09:44:45Z

Description

Adds integration test suite for fast sync with 3 main scenarios on happy path i.e

fast sync blockchain with empty mpt
fast sync blockchain and non-empty mpt
fast sync blockchain with target block update

I intended to add more testcases and refactor fast sync a little bit, but due to bug found along the way this pr grew up to large for my taste.

Important Changes Introduced

During Testing 3 bugs were founds and fixed:

Bug in FastSync, if target block contains hash of empty trie the fast sync would get stuck asking peers for non exisiting nodes
Bug in EtcPeerManager peer info handling, after initial handshake peer block best hash info was not updated. It lead to FastSync asking for most probably stale blocks, which lead to syncing to target block which were already pruned.
Bug in EtcPeerManager providing handshaked peers to other parts of the system, if newly connected peer was just starting i.e its best block was its genesis block, then it would not be treated as handshaked peer. It is important as mantis would not even broadcast block to such peer.

…-integration-test

mmrozek · 2020-09-16T11:16:06Z

src/main/scala/io/iohk/ethereum/network/EtcPeerManagerActor.scala

+          appStateStorage.putEstimatedHighestBlock(maxBlockNumber)
+
+        if (maxBlockNumber > initialPeerInfo.maxBlockNumber)
+          initialPeerInfo.withMaxBlockNumber(maxBlockNumber).withMaxBlockHash(maxBlockHash)


Shouldn't we use a method that updates both number and hash atomically? I think, there is no case to update only one field

I wondered about it, so if there is another person with same concern I will change it.

mmrozek · 2020-09-16T11:25:06Z

src/it/scala/io/iohk/ethereum/sync/FastSyncItSpec.scala

+  object FakePeer {
+    def startFakePeer(peerName: String): Task[FakePeer] = {
+      for {
+        peer <- Task(new FakePeer(peerName)).memoizeOnSuccess


It is def so .memoizeOnSuccess won't work here

some leftover from debugging i will remove it

mmrozek · 2020-09-16T11:29:48Z

src/main/scala/io/iohk/ethereum/network/EtcPeerManagerActor.scala

@@ -112,7 +118,7 @@ class EtcPeerManagerActor(peerManagerActor: ActorRef, peerEventBusActor: ActorRe
    * @return new updated peer info
    */
  private def handleSentMessage(message: Message, initialPeerWithInfo: PeerWithInfo): PeerInfo =


Do we need message: Message?

we do not, but updatePeersWithInfo expect function with shape messageHandler: (Message, PeerWithInfo) => PeerInfo that why i would leave it as it is.

…-integration-test

mmrozek

LGTM

ntallar

Nice catches 🥇

ntallar · 2020-09-16T17:29:27Z

src/main/scala/io/iohk/ethereum/network/EtcPeerManagerActor.scala

      sender() ! HandshakedPeers(peersWithInfo.collect {
-        case (_, PeerWithInfo(peer, peerInfo)) if peerInfo.maxBlockNumber > 0 => peer -> peerInfo
+        case (_, PeerWithInfo(peer, peerInfo)) if peerHasUdpatedBestBlock(peerInfo) => peer -> peerInfo


Doesn't the status exchange step guarantee that the peer has our node's genesis?

And as we are updating the bestBlockHash with the bestBlockNumber, shouldn't bestBlockHash!=genesisHash always imply maxBlockNumber>0? What's the case that peerHasUdpatedBestBlock isn't true for a peer?

take in mind that we do not exchange best block number handshake, and just after handshake each peer has maxBlockNumber set to 0. Only just after hadshake we make additional request for peer best block header, and only update maxBlockNumber if we receive this header.

So just after handshake there are peers which can have bestBlockHash!=genesisHash && maxBlockNumber==0, and we do not want to provide them as full handshaked peers.

Oh I see now the the purpose of this

But why don't we want to provide them as full handshaked peers? I wasn't able to find any usage of the maxBlockNumber on our this response for now

So my guess is that it is mostly due to maxBlockNumber taking part in broadcasting blocks i.e if we were to provide peers with maxBlockNumber set to 0, then until they sens us some newBlock, we would broadcast to them all blocks even really old ones.

Imo we could get rid of it if we were to port known nodes and transaction tracking from other project

I sounds like a reasonable guess, should we add a comment on this case?

src/main/scala/io/iohk/ethereum/network/EtcPeerManagerActor.scala

src/main/scala/io/iohk/ethereum/blockchain/sync/regular/OldRegularSync.scala

src/main/scala/io/iohk/ethereum/network/handshaker/EtcNodeStatusExchangeState.scala

src/it/scala/io/iohk/ethereum/sync/FastSyncItSpec.scala

ntallar

Last comments, apart from which LGTM!

src/it/scala/io/iohk/ethereum/sync/FastSyncItSpec.scala

ntallar · 2020-09-17T19:04:54Z

src/main/scala/io/iohk/ethereum/network/EtcPeerManagerActor.scala

      sender() ! HandshakedPeers(peersWithInfo.collect {
-        case (_, PeerWithInfo(peer, peerInfo)) if peerInfo.maxBlockNumber > 0 => peer -> peerInfo
+        case (_, PeerWithInfo(peer, peerInfo)) if peerHasUdpatedBestBlock(peerInfo) => peer -> peerInfo


Oh I see now the the purpose of this

But why don't we want to provide them as full handshaked peers? I wasn't able to find any usage of the maxBlockNumber on our this response for now

src/main/scala/io/iohk/ethereum/network/EtcPeerManagerActor.scala

src/it/scala/io/iohk/ethereum/sync/FastSyncItSpec.scala

…-integration-test

ntallar

Awaiting the results from testing it on mainnet, apart from which LGTM!

 .\\            //.
. \ \          / /.
.\  ,\     /` /,.-
 -.   \  /'/ /  .
 ` -   `-'  \  -
   '.       /.\`
      -    .-
      :`//.'
      .`.'
      .' BP

…-integration-test

KonradStaniec · 2020-09-21T09:47:31Z

so @mmrozek @ntallar some info from sync testing:

after all the fixes (and tweaking some config values for which i will made second pr) fast sync started working ! I was able to sync up 2 times to the tip of the mainnet and hold a litte bit on top.
I found another bug which made test flaky. Basically if we read 2 frames from the socket at once i.e Seq(Hello, Status), then based on Hello message we should decompress (or not) Status. With current way of handling , it was imposible as we have read all messages form socket and only then set compression option. It could lead to not handling Status message correctly, and due to that failed handshakes.

KonradStaniec · 2020-09-21T12:14:46Z

also note: during debuggig i have found that some of the parity nodes does not follow protocol in case of disconnect message i.e they do not compress Disconnect message sent after Hello message which leads to decoding failures on our side. My approach it to just ignore it, as parity no longer supports ETC and it forms much smaller base of clients than geth.

…-integration-test

ntallar · 2020-09-21T17:38:17Z

My approach it to just ignore it, as parity no longer supports ETC and it forms much smaller base of clients than geth.

Could this be a problem for Mantis ETH support? (as we are at least for now keeping it)

Btw, the last bug fix LGTM!

KonradStaniec · 2020-09-22T05:47:59Z

I do not think it could be a big probem, as the only message which did not follow protocl was Disconnect so eitherway we are disconnecting from this peer, and even on eth 80% clients are geth nodes.

…-integration-test

KonradStaniec added 2 commits September 15, 2020 07:32

[ETCM-102] Add integration tests to fast sync

b4d1c71

[ETCM-102] Add integration tests to fast sync

1238df4

KonradStaniec requested review from mmrozek and ntallar September 16, 2020 09:44

[ETCM-102] Fix bugs found during integration testing

72da7a3

KonradStaniec force-pushed the etcm-102/fast-sync-integration-test branch from 01584d5 to 72da7a3 Compare September 16, 2020 09:48

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

4dfbbc0

…-integration-test

mmrozek reviewed Sep 16, 2020

View reviewed changes

KonradStaniec added 2 commits September 16, 2020 14:48

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

347c4ed

…-integration-test

[ETCM-102] Pr comments

fae8700

mmrozek approved these changes Sep 16, 2020

View reviewed changes

ntallar requested changes Sep 16, 2020

View reviewed changes

[ETCM-102] Custom constructors for PeerInfo. Rename things in tests

6ba4c8c

ntallar requested changes Sep 17, 2020

View reviewed changes

KonradStaniec added 2 commits September 18, 2020 07:07

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

ac085b9

…-integration-test

[ETCM-102] Fix typo. Clear up it tests

e2044c1

ntallar approved these changes Sep 18, 2020

View reviewed changes

KonradStaniec added 3 commits September 18, 2020 14:06

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

851b9fc

…-integration-test

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

bcfc6e5

…-integration-test

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

030870a

…-integration-test

KonradStaniec force-pushed the etcm-102/fast-sync-integration-test branch from 2ea0d6c to 68cf614 Compare September 21, 2020 09:38

[ETCM-102] Fix race condition in frame encoder/decoder

cd44ef1

KonradStaniec force-pushed the etcm-102/fast-sync-integration-test branch from 68cf614 to cd44ef1 Compare September 21, 2020 09:45

[ETCM-102] Break connection in case of decoding failure

76dedc0

KonradStaniec mentioned this pull request Sep 21, 2020

[CHORE] Imporve fast sync config params #682

Merged

KonradStaniec added 2 commits September 21, 2020 15:52

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

d2fd3f6

…-integration-test

[ETCM-102] Fix conflicts after merge with develop

5665378

Merge remote-tracking branch 'origin/develop' into etcm-102/fast-sync…

6a35748

…-integration-test

KonradStaniec merged commit 6a35748 into develop Sep 22, 2020

KonradStaniec deleted the etcm-102/fast-sync-integration-test branch September 22, 2020 07:56

[ETCM-102] Fast sync integration tests #672

[ETCM-102] Fast sync integration tests #672

Uh oh!

Conversation

KonradStaniec commented Sep 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Important Changes Introduced

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmrozek left a comment

Choose a reason for hiding this comment

Uh oh!

ntallar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntallar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntallar left a comment

Choose a reason for hiding this comment

Uh oh!

KonradStaniec commented Sep 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KonradStaniec commented Sep 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntallar commented Sep 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KonradStaniec commented Sep 22, 2020

Uh oh!

Uh oh!

KonradStaniec commented Sep 16, 2020 •

edited

Loading

KonradStaniec commented Sep 21, 2020 •

edited

Loading

KonradStaniec commented Sep 21, 2020 •

edited

Loading

ntallar commented Sep 21, 2020 •

edited

Loading