Stream queues failing publisher confirms for multi-node cluster #3224

chadknutson · 2021-07-22T21:45:48Z

chadknutson
Jul 22, 2021

I have a 3 node cluster hosted in a cloud platform with RabbitMQ 3.9.0.rc.2, Erlang 24.0.2 in Ubuntu (ARM processor). I'm using the released stream-perf-test-0.1.0 client running on my laptop (osx).

For a 1 node cluster, the client is working fine. But I am not getting publisher confirms for my 3 node cluster. I am running the command on laptop, where the RabbitMQ uri is DNS load-balanced over its 3 nodes (-01, -02, -03).
java -jar stream-perf-test-0.1.0.jar --uris rabbitmq-stream://$USR:[email protected]:5552/vhost
Response:
Starting producer
1, published 3637 msg/s, confirmed 0 msg/s, consumed 0 msg/s, latency min/median/75th/95th/99th 0/0/0/0/0 µs, chunk size 0

In the RabbitMQ manager for the queue, I see that the queue leader is node -01, online are nodes -01 and -02, members are -01, -02, -03. The streams view shows locator connected to node-01, producer on node-01, and consumer on node-02. Manager reports that 10000 messages have been published and that there is a consumer for the queue.

I have configured the stream plugin so that each of the 3 nodes uses its own uri for advertised host. E.g., for node-01
{advertised_host, "rabbitserver-01.example.com"}

I don't see anything in the RabbitMQ logs [probably that's another discussion. I don't think that I have the new logger configured correctly].

Am I missing a configuration, or is there a bug in either the client or RabbitMQ?

acogoluegnes · 2021-07-23T09:04:07Z

acogoluegnes
Jul 23, 2021
Maintainer

with a load balancer, you should not have to set the advertised_host setting and you should use the --load-balancer option in the performance tool (we're about to publish a blog post that actually covers such a case).

9 replies

acogoluegnes Jul 26, 2021
Maintainer

The performance tool will always connect to the load balancer with the --load-balancer option. Using this option will help to have more predictable results, but I don't think it's the solution of the problem here.

The replication mechanism for streams does not use the Erlang distribution, it uses straight TCP between nodes, so it may be the cause of the problem. Are there any firewall rules between nodes?

gerhard Jul 27, 2021

How's this coming along @chadknutson @dentarg ?

johanrhodin Jul 27, 2021

There is no firewall between the nodes.
Which port is the inter node communication running on for streams?

This is the listeners output from rabbitmqctl status:

Listeners

Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: [::], port: 5552, protocol: stream, purpose: stream
Interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: 0.0.0.0, port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS

gerhard Jul 28, 2021

As @kjnilsson mentioned below, dedicated TCP connections that open random TCP ports between nodes are an important detail:

This is taken from Breaking Magical Barriers talk from Systems @Scale Spring 2021 talk, starting at 25:05:

chadknutson Aug 13, 2021
Author

Still no success in setting up mirrored stream queues in our clusters. Today, I monitored network traffic on the node that is the leader of the stream queue.

Before I get to that, I share more details about our cluster configuration. Each cluster is deployed within a VPC. Inside the VPC, each node has an internal IP (e.g. 10.56.72.1) as well as internal hostname (e.g. node-01.in.example.com). The nodes also have external IP (e.g., 54.1.1.1) and external hostname (e.g. node-01.example.com). We expect that any communication between nodes will use internal host/ IP.

After I have created a stream queue on a 3 node cluster, the leader node is selected, but online member nodes fail to populate. In Wireshark, I see network traffic between the internal IP of the stream leader and the external IP of a mirror.

Here is a sampling of the communication that I see. Ideally, I would be able to filter out just the stream queue traffic, but it is not clear to me what the filter criteria should be.

num         time                 source                     destination             protocol    length       info
6974	3.757509	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data [TCP segment of a reassembled PDU]
6975	3.757516	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data [TCP segment of a reassembled PDU]
6976	3.757517	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data, Application Data
6977	3.757520	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data, Application Data
6978	3.757521	IP-01-INTERNAL	IP-02-EXTERNAL	TCP	1514	37788 → 5671 [PSH, ACK] Seq=75079 Ack=1 Win=459 Len=1448 TSval=3683319730 TSecr=65526665 [TCP segment of a reassembled PDU]
6979	3.757534	IP-01-INTERNAL	IP-02-EXTERNAL	TCP	1514	37788 → 5671 [ACK] Seq=76527 Ack=1 Win=459 Len=1448 TSval=3683319730 TSecr=65526665 [TCP segment of a reassembled PDU]
6980	3.757541	IP-01-INTERNAL	IP-02-EXTERNAL	TCP	1514	37788 → 5671 [ACK] Seq=77975 Ack=1 Win=459 Len=1448 TSval=3683319730 TSecr=65526665 [TCP segment of a reassembled PDU]
6981	3.757549	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data [TCP segment of a reassembled PDU]
6982	3.757553	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data, Application Data
6983	3.757554	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Application Data [TCP segment of a reassembled PDU]
6984	3.757577	IP-01-INTERNAL	IP-02-EXTERNAL	TCP	1514	37788 → 5671 [ACK] Seq=83767 Ack=1 Win=459 Len=1448 TSval=3683319731 TSecr=65526665 [TCP segment of a reassembled PDU]
8011	4.631005	IP-02-EXTERNAL	IP-01-INTERNAL	TCP	66	[TCP ACKed unseen segment] [TCP Previous segment not captured] 5671 → 37788 [ACK] Seq=38 Ack=260423 Win=15928 Len=0 TSval=65527539 TSecr=3683320532
8012	4.631029	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	[TCP ACKed unseen segment] [TCP Previous segment not captured] , Ignored Unknown Record
8013	4.631031	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Ignored Unknown Record
8014	4.631035	IP-01-INTERNAL	IP-02-EXTERNAL	TLSv1.2	1514	Ignored Unknown Record

How can we enforce internal ( local IP) communication between nodes?

kjnilsson · 2021-07-27T22:11:50Z

kjnilsson
Jul 27, 2021
Maintainer

Probably 6000-6500 https://github.com/rabbitmq/osiris/blob/87447deb0361a7bf5caa47363031f2dfad6b0fe3/Makefile#L8

On Tue, 27 Jul 2021 at 23:00, Johan Rhodin ***@***.***> wrote: There is no firewall between the nodes. Which port is the inter node communication running on for streams? This is the listeners output from rabbitmqctl status: Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication Interface: [::], port: 15672, protocol: http, purpose: HTTP API Interface: [::], port: 5552, protocol: stream, purpose: stream Interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0 Interface: 0.0.0.0, port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS``` — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3224 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJAHFHFZBL53MEWAH4H4EDTZ4T63ANCNFSM5A23CZ3Q> .

-- *Karl Nilsson*

0 replies

kjnilsson · 2021-08-13T17:05:05Z

kjnilsson
Aug 13, 2021
Maintainer

Do your logs contain this warning? https://github.com/rabbitmq/osiris/blob/master/src/osiris_replica_reader.erl#L58 This should tell you what host are attempted.

On Fri, 13 Aug 2021 at 17:44, Chad Knutson ***@***.***> wrote: Still no success in setting up mirrored stream queues in our clusters. Today, I monitored network traffic on the node that is the leader of the stream queue. Before I get to that, I share more details about our cluster configuration. Each cluster is deployed within a VPC. Inside the VPC, each node has an internal IP (e.g. 10.56.72.1) as well as internal hostname (e.g. node-01.in.example.com). The nodes also have external IP (e.g., 54.1.1.1) and external hostname (e.g. node-01.example.com). We expect that any communication between nodes will use internal host/ IP. After I have created a stream queue on a 3 node cluster, the leader node is selected, but online member nodes fail to populate. In Wireshark, I see network traffic between the internal IP of the stream leader and the external IP of a mirror. Here is a sampling of the communication that I see. Ideally, I would be able to filter out just the stream queue traffic, but it is not clear to me what the filter criteria should be. num time source destination protocol length info 6974 3.757509 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data [TCP segment of a reassembled PDU] 6975 3.757516 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data [TCP segment of a reassembled PDU] 6976 3.757517 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data, Application Data 6977 3.757520 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data, Application Data 6978 3.757521 IP-01-INTERNAL IP-02-EXTERNAL TCP 1514 37788 → 5671 [PSH, ACK] Seq=75079 Ack=1 Win=459 Len=1448 TSval=3683319730 TSecr=65526665 [TCP segment of a reassembled PDU] 6979 3.757534 IP-01-INTERNAL IP-02-EXTERNAL TCP 1514 37788 → 5671 [ACK] Seq=76527 Ack=1 Win=459 Len=1448 TSval=3683319730 TSecr=65526665 [TCP segment of a reassembled PDU] 6980 3.757541 IP-01-INTERNAL IP-02-EXTERNAL TCP 1514 37788 → 5671 [ACK] Seq=77975 Ack=1 Win=459 Len=1448 TSval=3683319730 TSecr=65526665 [TCP segment of a reassembled PDU] 6981 3.757549 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data [TCP segment of a reassembled PDU] 6982 3.757553 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data, Application Data 6983 3.757554 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Application Data [TCP segment of a reassembled PDU] 6984 3.757577 IP-01-INTERNAL IP-02-EXTERNAL TCP 1514 37788 → 5671 [ACK] Seq=83767 Ack=1 Win=459 Len=1448 TSval=3683319731 TSecr=65526665 [TCP segment of a reassembled PDU] 8011 4.631005 IP-02-EXTERNAL IP-01-INTERNAL TCP 66 [TCP ACKed unseen segment] [TCP Previous segment not captured] 5671 → 37788 [ACK] Seq=38 Ack=260423 Win=15928 Len=0 TSval=65527539 TSecr=3683320532 8012 4.631029 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 [TCP ACKed unseen segment] [TCP Previous segment not captured] , Ignored Unknown Record 8013 4.631031 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Ignored Unknown Record 8014 4.631035 IP-01-INTERNAL IP-02-EXTERNAL TLSv1.2 1514 Ignored Unknown Record How can we enforce internal ( local IP) communication between nodes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3224 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJAHFECZCISBO3YCMCEMDTT4VDYDANCNFSM5A23CZ3Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- *Karl Nilsson*

2 replies

chadknutson Aug 13, 2021
Author

Yes, log for node-01 (host of stream queue leader) has that warning

2021-08-13 18:25:04.678848+00:00 [warn] <0.9366.0> osiris replica connection refused, host:{127,0,1,1}

our /etc/hosts file is unfortunately ambiguous: the 2 possible entries are 'node-01.example.com' [the external hostname] and 'node-01' [ indicating the internal hostname]

Anyway, this is quite odd, right? The replica connection should not try to connect to the local node (-01). It should be trying to connect to either node-02 or node-03.

chadknutson Aug 13, 2021
Author

For further testing, I removed the external IP and hostnames from all nodes, including the /etc/hosts file. The only way to communicate now is within the VPC using internal IP.

However, RabbitMQ log is unchanged [now host 127.0.1.1 refers only to 'internal' host name].

2021-08-13 19:08:55.419699+00:00 [warn] <0.12551.2> osiris replica connection refused, host:{127,0,1,1}

Wireshark shows similar errors as initial capture, but now all communication is between internal IPs.

kjnilsson · 2021-08-13T19:54:08Z

kjnilsson
Aug 13, 2021
Maintainer

Here is where the replica node resolves the ip addresses on the host that the replica reader will try in turn to connect to. You can run this in an Erlang shell or rabbitmqctl eval command until it resolves something that can be connected to. https://github.com/rabbitmq/osiris/blob/master/src/osiris_replica.erl#L173

On Fri, 13 Aug 2021 at 20:16, Chad Knutson ***@***.***> wrote: For further testing, I removed the external IP and hostnames from all nodes, including the /etc/hosts file. The only way to communicate now is within the VPC using internal IP. However, RabbitMQ log is unchanged [now host 127.0.1.1 refers only to 'internal' host name]. 2021-08-13 19:08:55.419699+00:00 [warn] <0.12551.2> osiris replica connection refused, host:{127,0,1,1} Wireshark shows similar errors as initial capture, but now all communication is between internal IPs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3224 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJAHFEZTDQUT4HSYPQFMR3T4VVR3ANCNFSM5A23CZ3Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- *Karl Nilsson*

1 reply

chadknutson Aug 13, 2021
Author

I get good results from that evaluation:
$ rabbitmqctl eval 'inet:getaddrs("node-01", inet).'
{ok,[{127,0,1,1}]}

$ rabbitmqctl eval 'inet:getaddrs("node-02", inet).'
{ok,[{10,16,16,222}]}

$ rabbitmqctl eval 'inet:getaddrs("node-03", inet).'
{ok,[{10,16,16,98}]}

This raises a few questions:
Why is the code only choosing 'node-01' every time? Why won't it choose one of the other nodes?

I don't understand why it wouldn't be able to connect to 127.0.1.1 in any case. I can telnet to that ip from node-01:
$ telnet 127.0.1.1 5672
Trying 127.0.1.1...
Connected to 127.0.1.1.

kjnilsson · 2021-08-13T21:59:10Z

kjnilsson
Aug 13, 2021
Maintainer

Node-01 must be the node the replica is on. Then it sends that info to the node with the leader which tries to connect back. Port is in the range I mentioned before and is only listening whilst replica process is active.

On Fri, 13 Aug 2021 at 22:50, Chad Knutson ***@***.***> wrote: I get good results from that evaluation: $ rabbitmqctl eval 'inet:getaddrs("node-01", inet).' {ok,[{127,0,1,1}]} $ rabbitmqctl eval 'inet:getaddrs("node-02", inet).' {ok,[{10,16,16,222}]} $ rabbitmqctl eval 'inet:getaddrs("node-03", inet).' {ok,[{10,16,16,98}]} This raises a few questions: Why is the code only choosing 'node-01' every time? Why won't it choose one of the other nodes? I don't understand why it wouldn't be able to connect to 127.0.1.1 in any case. I can telnet to that ip from node-01: $ telnet 127.0.1.1 5672 Trying 127.0.1.1... Connected to 127.0.1.1. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3224 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJAHFEQDNMH2CZJTONYPVLT4WHS5ANCNFSM5A23CZ3Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- *Karl Nilsson*

1 reply

chadknutson Aug 13, 2021
Author

Node-01 is the leader node. I ensure that by using client_local when creating the stream queue. All errors that I have shown in this discussion are on node-01. Maybe I missed in the logs where a replica is generated? I scanned logs for other nodes, but I don't see anything relevant.

Good news here is that Carl is now taking a look. Hopefully a new set of eyes will help.

annieblomgren · 2021-08-15T21:23:59Z

annieblomgren
Aug 15, 2021

After running sudo rabbitmqctl eval 'rabbit_networking:start_tcp_listener({{0,0,0,0},6000},1).' on the leader it seems to work.

Tried restarting the leader and another node became the leader, so had to run it again on that node.

1 reply

chadknutson Aug 16, 2021
Author

I am able to reproduce this also. The mirroring appears to be stable after restarting the listener.

Magnus comment below shows that the listener is up initially for a short time and gets a bad header. Then it must crash shortly after that? But it does not crash after the restart.

magnushoerberg · 2021-08-16T13:58:44Z

magnushoerberg
Aug 16, 2021

Here is a some other logs we are getting

When starting a follower we get the following in the leader log

2021-08-16 13:44:52.682501+00:00 [info] <0.4451.11> accepting AMQP connection <0.4451.11> (127.0.0.1:49112 -> 127.0.1.1:6000)
2021-08-16 13:44:52.682710+00:00 [erro] <0.4451.11> closing AMQP connection <0.4451.11> (127.0.0.1:49112 -> 127.0.1.1:6000):
2021-08-16 13:44:52.682710+00:00 [erro] <0.4451.11> {bad_header,<<175,253,222,87,133,168,57,137>>}
2021-08-16 13:44:52.821535+00:00 [info] <0.416.0> rabbit on node 'rabbit@dev-funny-cyan-badger-02' up

The corresponding lines on the follower

2021-08-16 13:44:53.162298+00:00 [info] <0.496.0> rabbit_stream_coordinator: follower did not have entry at 355112 in 4. Requesting {rabbit_stream_coordinator,'rabbit@dev-funny-cyan-badger-01'} from 355037
2021-08-16 13:44:53.162602+00:00 [info] <0.496.0> rabbit_stream_coordinator: detected a new leader {rabbit_stream_coordinator,'rabbit@dev-funny-cyan-badger-01'} in term 4
2021-08-16 13:44:53.162745+00:00 [dbug] <0.496.0> rabbit_stream_coordinator: follower -> await_condition in term: 4 machine version: 0
2021-08-16 13:44:53.162797+00:00 [warn] <0.496.0> rabbit_stream_coordinator: await_condition - Leader node 'rabbit@dev-funny-cyan-badger-01' might be down. Re-entering follower state.
2021-08-16 13:44:53.162937+00:00 [dbug] <0.496.0> rabbit_stream_coordinator: await_condition -> follower in term: 4 machine version: 0
2021-08-16 13:44:53.162982+00:00 [dbug] <0.496.0> rabbit_stream_coordinator: is not new, setting election timeout.
2021-08-16 13:44:53.165973+00:00 [dbug] <0.496.0> rabbit_stream_coordinator: unknown command {nodedown,'rabbit@dev-funny-cyan-badger-02'}

0 replies

chadknutson · 2021-09-21T19:06:17Z

chadknutson
Sep 21, 2021
Author

I am happy to report that I finally have a better handle on this. The biggest issue is that we do not use true load balancers for our clusters in general. For the one case we use a load balancer (AWS privatelink), the stream client works perfectly with the --load-balancer flag.

Other cases are complicated by the node name (e.g., test-node-01) vs FQDN (e.g. test-node-01.example.com). When using public internet for connections, the stream client will try to connect to the node name, which the client cannot resolve, instead of the FQDN. In such cases, I found that setting the value for 'advertised_host' to the FQDN solves the problem.

The other issue that was causing problems was a loopback hostname in our /etc/hosts file. This was causing the mirroring issue described above when a stream queue was declared in a multi-node cluster. Removing that entry solved the problem.

Thank you for the guidance provided in many of these answers. I am happy to share more details about our configuration and conclusions upon request.

1 reply

acogoluegnes Sep 22, 2021
Maintainer

Thanks for the follow-up, glad to hear you sorted out the problem!

Stream queues failing publisher confirms for multi-node cluster #3224

Uh oh!

chadknutson Jul 22, 2021

Replies: 8 comments · 15 replies

Uh oh!

acogoluegnes Jul 23, 2021 Maintainer

Uh oh!

acogoluegnes Jul 26, 2021 Maintainer

Uh oh!

gerhard Jul 27, 2021

Uh oh!

Uh oh!

johanrhodin Jul 27, 2021

Uh oh!

gerhard Jul 28, 2021

Uh oh!

chadknutson Aug 13, 2021 Author

Uh oh!

kjnilsson Jul 27, 2021 Maintainer

Uh oh!

kjnilsson Aug 13, 2021 Maintainer

Uh oh!

Uh oh!

chadknutson Aug 13, 2021 Author

Uh oh!

chadknutson Aug 13, 2021 Author

Uh oh!

kjnilsson Aug 13, 2021 Maintainer

Uh oh!

Uh oh!

chadknutson Aug 13, 2021 Author

Uh oh!

kjnilsson Aug 13, 2021 Maintainer

Uh oh!

chadknutson Aug 13, 2021 Author

Uh oh!

annieblomgren Aug 15, 2021

Uh oh!

chadknutson Aug 16, 2021 Author

Uh oh!

magnushoerberg Aug 16, 2021

Uh oh!

chadknutson Sep 21, 2021 Author

Uh oh!

acogoluegnes Sep 22, 2021 Maintainer

chadknutson
Jul 22, 2021

Replies: 8 comments 15 replies

acogoluegnes
Jul 23, 2021
Maintainer

acogoluegnes Jul 26, 2021
Maintainer

chadknutson Aug 13, 2021
Author

kjnilsson
Jul 27, 2021
Maintainer

kjnilsson
Aug 13, 2021
Maintainer

chadknutson Aug 13, 2021
Author

chadknutson Aug 13, 2021
Author

kjnilsson
Aug 13, 2021
Maintainer

chadknutson Aug 13, 2021
Author

kjnilsson
Aug 13, 2021
Maintainer

chadknutson Aug 13, 2021
Author

annieblomgren
Aug 15, 2021

chadknutson Aug 16, 2021
Author

magnushoerberg
Aug 16, 2021

chadknutson
Sep 21, 2021
Author

acogoluegnes Sep 22, 2021
Maintainer