Skip to content

Improve the super stream reconnection #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 18, 2024
Merged

Conversation

Gsantomaggio
Copy link
Member

@Gsantomaggio Gsantomaggio commented Jan 12, 2024

Fixes: #333
Fixes: #335

Preface

The PR is focused on making the client more stable during the server restart and better handling the metadata update event.

Most of the changes are in the super stream producer and consumer.
See the issue #333

1- What's changed in the super stream part

The RawSuperStreamProducer and RawSuperStreamConsumer expose two callbacks:

  • ConnectionClosedHandler (reason, stream)
  • MetadataHandler(update)
  • ISuperStreamProducer and ISuperStreamConsumer interfaces

ConnectionClosedHandler

The callback is raised each time a partition is closed for some reason. You can check the reason with:

1- public const string Normal = "TCP connection closed normal";
2- public const string Unexpected = "TCP connection closed unexpected";

1- The super stream partition is closed by the Close() method, so there are no problems. It can be helpful in terms of alerting or logs.

2- The super stream partition is closed in an unexpected way like kill the connection, network problems broker restart etc..

In that case, you can reconnect the partition with the ReconnectPartition method.
Like:

var config = new RawSuperStreamProducerConfig("MySuperStream")
        {
            Routing = message1 => message1.Properties.MessageId.ToString(),
            ClientProvidedName = clientName,
        };
streamProducer = await system.CreateRawSuperStreamProducer(config);
        config.ConnectionClosedHandler = async (reason, stream) =>
        {
            // here, you can decide if it makes sense to wait a bit before reconnecting the 
			// partition
			var streamInfo = await system.StreamInfo(stream);
            await streamProducer.ReconnectPartition(streamInfo);
        };

MetadataHandler

It happens when there is a change in the stream topology.
See the document for more details.

In this case, the server drops all the producers and consumers. So, you need to react to the Metadata Handler event. You can decide to reconnect the producer and consumer
using the ReconnectPartition

Producer and Consumer classes

The Producer and Consumer classes handle automatically the ConnectionClosedHandler and the MetadataHandler events for stream and super stream.

2- Fail fast

The Producer and Consumer classes can restore the connection automatically when they are up and running.
The classes will fail immediately if there are problems during the first connection.
This behaviour is now like the Java Client's.

3- Survive a cluster restart

Please follow this presentation to understand what happens during a cluster restart

You can create your own way to handle the reconnection based on Raw Classes.

Fixes #333

Signed-off-by: Gabriele Santomaggio <[email protected]>
Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Comparison is base (7fa33dd) 91.84% compared to head (8e8dc04) 91.80%.

Files Patch % Lines
RabbitMQ.Stream.Client/Reliable/ReliableBase.cs 76.27% 11 Missing and 3 partials ⚠️
RabbitMQ.Stream.Client/ClientExceptions.cs 0.00% 5 Missing and 1 partial ⚠️
RabbitMQ.Stream.Client/Client.cs 0.00% 3 Missing and 2 partials ⚠️
RabbitMQ.Stream.Client/ConnectionsPool.cs 20.00% 3 Missing and 1 partial ⚠️
RabbitMQ.Stream.Client/Reliable/ConsumerFactory.cs 90.90% 3 Missing and 1 partial ⚠️
RabbitMQ.Stream.Client/RawSuperStreamConsumer.cs 95.65% 0 Missing and 2 partials ⚠️
RabbitMQ.Stream.Client/RawSuperStreamProducer.cs 97.18% 0 Missing and 2 partials ⚠️
...bitMQ.Stream.Client/Reliable/IReconnectStrategy.cs 33.33% 1 Missing and 1 partial ⚠️
RabbitMQ.Stream.Client/Reliable/ProducerFactory.cs 95.91% 1 Missing and 1 partial ⚠️
RabbitMQ.Stream.Client/Reliable/Producer.cs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #344      +/-   ##
==========================================
- Coverage   91.84%   91.80%   -0.05%     
==========================================
  Files         116      116              
  Lines       11065    11264     +199     
  Branches      915      895      -20     
==========================================
+ Hits        10163    10341     +178     
- Misses        699      709      +10     
- Partials      203      214      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* Reconnect the super stream partition with the stream info
  the stream could change the topology. The new stream info need to
  be passed to the producer or consumer

* Expose stream_info it gives info about the stream location like:
  leader and followers

Signed-off-by: Gabriele Santomaggio <[email protected]>
Signed-off-by: Gabriele Santomaggio <[email protected]>
Signed-off-by: Gabriele Santomaggio <[email protected]>
@Gsantomaggio Gsantomaggio marked this pull request as ready for review January 17, 2024 10:24
Signed-off-by: Gabriele Santomaggio <[email protected]>
Add the SuperStream interface

Signed-off-by: Gabriele Santomaggio <[email protected]>
Signed-off-by: Gabriele Santomaggio <[email protected]>
Signed-off-by: Gabriele Santomaggio <[email protected]>
@Gsantomaggio Gsantomaggio merged commit 083ff20 into main Jan 18, 2024
@Gsantomaggio Gsantomaggio deleted the super_stream_reconnect branch January 18, 2024 19:52
@Gsantomaggio Gsantomaggio added this to the 1.8.0 milestone Jan 22, 2024
@Gsantomaggio Gsantomaggio mentioned this pull request Jan 30, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[edge case] Error during reconnect if the stream in the meantime is delete
1 participant