RPC timeouts can cause subsequent ClassCastExceptions #290

vikinghawk · 2017-07-06T05:37:07Z

During cluster node failovers we have seen some TimeoutExceptions during connection recovery end up causing ClassCastExceptions because the reply for the timed out rpc request comes in while a 2nd rpc request is waiting.

Ideally in cases like this where the requestor has already timed out and gone away you would just throw that incoming reply away. Looking thru the code I'm not sure there is a good way to handle this though. Checking in all places that the command type is of the expected class prevents the exception, but you still have the issue that the reply delivered is not actually for the current request.

I believe it would require tagging requests with a unique id and then have the server set that request id as a correlation id on the reply allowing the client to match the reply with the original request. Would something like that be feasible in the AMQP spec?

michaelklishin · 2017-07-06T08:55:52Z

How can a channel implementation know if "the requestor has already timed out and gone away"?

michaelklishin · 2017-07-06T08:57:01Z

I'm afraid extending every protocol method with a new field is unrealistic at this point.

vikinghawk · 2017-07-06T13:56:42Z

How can a channel implementation know if "the requestor has already timed out and gone away"?

The AMQChannel code already is catching the TimeoutException and calls cleanRpcChannelState to get ready for the next request before rethrowing the timeout ex.

I'm afraid extending every protocol method with a new field is unrealistic at this point.

Ya thats what I was afraid of. I didn't know if AMQContentHeader could have more fields added to it passively on methods such as Queue.Declare and Queue.DeclareOk

There are still ways to make the code better however.

we could protect against ClassCastExceptions by verifying the class passed to RpcContinuation.handleCommand is of the expected type

additionally for requests/replies such as Queue.Declare/DeclareOk we could verify the queue names match before assuming the reply if for the current request.

acogoluegnes · 2017-07-07T13:30:06Z

We can add an option to do the check. This would then go into 4.2.0.

@vikinghawk

During cluster node failovers some TimeoutExceptions during connection recovery end up causing ClassCastExceptions because the reply for the timed out RPC request comes in while a 2nd RPC request is waiting. This commit add the ConnectionFactory#channelCheckRpcReplyType flag that, if enabled, will make the channel check if a reply is compatible with the outstanding RPC request. The flag is set to false by default but could be set to true in the future, when the check logic is proven reliable. This code was originally contributed by @vikinghawk. Fixes #290

acogoluegnes · 2017-07-07T13:42:24Z

@vikinghawk @michaelklishin A PR that adapted @vikinghawk's code for 4.2.0.

vikinghawk · 2017-07-07T14:46:34Z

+1

acogoluegnes · 2017-07-10T13:04:45Z

@vikinghawk Thanks again for your contribution! You can have a try with the 4.2 snapshot.

Inverness · 2017-11-22T21:42:31Z

After encountering this issue myself and enabling the response type checking I got this:

2017-11-22 15:33:33 [AMQP Connection 10.5.17.63:5672] ERROR c.s.r.u.RMQDefaultExceptionHandler - Unhandled error in RabbitMQ connection
java.lang.NullPointerException: null
	at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:185)
	at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:111)
	at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:643)
	at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:47)
	at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:581)
	at java.lang.Thread.run(Thread.java:748)

The added code does not check if _activeRpc is null before calling canHandleReply()

References #290

acogoluegnes · 2017-11-24T14:44:30Z

@Inverness Thanks for pointing this out, 5ff6ad4 fixes this, it will go in to 4.4.0 (which will be released in the next few days).

Inverness · 2017-11-28T16:55:03Z

@acogoluegnes I'm currently using 5.0.0. When will the fix be available for that branch?

michaelklishin · 2017-11-28T16:56:19Z

Reasonably soon.

vikinghawk changed the title ~~RPC TimeoutExceptions can cause subsequent ClassCastExceptions~~ RPC timeouts can cause subsequent ClassCastExceptions Jul 6, 2017

michaelklishin closed this as completed Jul 6, 2017

michaelklishin added the wontfix label Jul 6, 2017

vikinghawk mentioned this issue Jul 6, 2017

ensure rpc reply is for the current request #291

Closed

acogoluegnes reopened this Jul 7, 2017

acogoluegnes added effort-low usability and removed wontfix labels Jul 7, 2017

acogoluegnes self-assigned this Jul 7, 2017

acogoluegnes added this to the 4.2.0 milestone Jul 7, 2017

acogoluegnes mentioned this issue Jul 7, 2017

Add option to ensure RPC reply is for the current request #292

Merged

michaelklishin closed this as completed Jul 7, 2017

This was referenced Jul 29, 2017

AMQChannel.exnWrappingRpc should use a timeout #295

Closed

Undefined method `message_count' for #<AMQ::Protocol::Basic::ConsumeOk> ruby-amqp/bunny#514

Closed

acogoluegnes added a commit that referenced this issue Nov 24, 2017

Check active RPC property against null

5ff6ad4

References #290

mgrafl mentioned this issue Oct 6, 2021

ClassCastException while waiting for RPC response #708

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC timeouts can cause subsequent ClassCastExceptions #290

RPC timeouts can cause subsequent ClassCastExceptions #290

vikinghawk commented Jul 6, 2017

michaelklishin commented Jul 6, 2017

Uh oh!

michaelklishin commented Jul 6, 2017

Uh oh!

vikinghawk commented Jul 6, 2017

Uh oh!

acogoluegnes commented Jul 7, 2017

Uh oh!

acogoluegnes commented Jul 7, 2017

Uh oh!

vikinghawk commented Jul 7, 2017

Uh oh!

acogoluegnes commented Jul 10, 2017

Uh oh!

Inverness commented Nov 22, 2017

Uh oh!

acogoluegnes commented Nov 24, 2017 •

edited

Loading

Uh oh!

Inverness commented Nov 28, 2017

Uh oh!

michaelklishin commented Nov 28, 2017

Uh oh!

RPC timeouts can cause subsequent ClassCastExceptions #290

RPC timeouts can cause subsequent ClassCastExceptions #290

Comments

vikinghawk commented Jul 6, 2017

michaelklishin commented Jul 6, 2017

Uh oh!

michaelklishin commented Jul 6, 2017

Uh oh!

vikinghawk commented Jul 6, 2017

Uh oh!

acogoluegnes commented Jul 7, 2017

Uh oh!

acogoluegnes commented Jul 7, 2017

Uh oh!

vikinghawk commented Jul 7, 2017

Uh oh!

acogoluegnes commented Jul 10, 2017

Uh oh!

Inverness commented Nov 22, 2017

Uh oh!

acogoluegnes commented Nov 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Inverness commented Nov 28, 2017

Uh oh!

michaelklishin commented Nov 28, 2017

Uh oh!

acogoluegnes commented Nov 24, 2017 •

edited

Loading