-
Notifications
You must be signed in to change notification settings - Fork 582
RPC timeouts can cause subsequent ClassCastExceptions #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How can a channel implementation know if "the requestor has already timed out and gone away"? |
I'm afraid extending every protocol method with a new field is unrealistic at this point. |
The AMQChannel code already is catching the TimeoutException and calls cleanRpcChannelState to get ready for the next request before rethrowing the timeout ex.
Ya thats what I was afraid of. I didn't know if AMQContentHeader could have more fields added to it passively on methods such as Queue.Declare and Queue.DeclareOk There are still ways to make the code better however.
|
We can add an option to do the check. This would then go into 4.2.0. |
During cluster node failovers some TimeoutExceptions during connection recovery end up causing ClassCastExceptions because the reply for the timed out RPC request comes in while a 2nd RPC request is waiting. This commit add the ConnectionFactory#channelCheckRpcReplyType flag that, if enabled, will make the channel check if a reply is compatible with the outstanding RPC request. The flag is set to false by default but could be set to true in the future, when the check logic is proven reliable. This code was originally contributed by @vikinghawk. Fixes #290
@vikinghawk @michaelklishin A PR that adapted @vikinghawk's code for 4.2.0. |
+1 |
@vikinghawk Thanks again for your contribution! You can have a try with the 4.2 snapshot. |
After encountering this issue myself and enabling the response type checking I got this:
The added code does not check if _activeRpc is null before calling canHandleReply() |
@Inverness Thanks for pointing this out, 5ff6ad4 fixes this, it will go in to 4.4.0 (which will be released in the next few days). |
@acogoluegnes I'm currently using 5.0.0. When will the fix be available for that branch? |
Reasonably soon. |
During cluster node failovers we have seen some TimeoutExceptions during connection recovery end up causing ClassCastExceptions because the reply for the timed out rpc request comes in while a 2nd rpc request is waiting.
Ideally in cases like this where the requestor has already timed out and gone away you would just throw that incoming reply away. Looking thru the code I'm not sure there is a good way to handle this though. Checking in all places that the command type is of the expected class prevents the exception, but you still have the issue that the reply delivered is not actually for the current request.
I believe it would require tagging requests with a unique id and then have the server set that request id as a correlation id on the reply allowing the client to match the reply with the original request. Would something like that be feasible in the AMQP spec?
The text was updated successfully, but these errors were encountered: