Skip to content

Exception in NIO client when broker sends a large message over SSL #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dimas opened this issue Sep 20, 2017 · 4 comments
Closed

Exception in NIO client when broker sends a large message over SSL #307

dimas opened this issue Sep 20, 2017 · 4 comments
Assignees
Milestone

Comments

@dimas
Copy link
Contributor

dimas commented Sep 20, 2017

It looks like there is a bug in how SslEngineByteBufferInputStream class "reassembles" the stream.
The piece of code below either passes or fails depending on where I set breakpoint.

                int bytesRead = NioHelper.read(channel, cipherIn);
                if (bytesRead > 0) {
                    cipherIn.flip();
                } else {
                    bytesRead = NioHelper.retryRead(channel, cipherIn);
                    if(bytesRead <= 0) {
                        throw new IllegalStateException("Should be reading something from the network");
                    }
                }

If breakpoint is at NioHelper.read (so there is some time for data from the network be available and bytesRead be above zero) - it works.
However if breakpoint is set to the next line, data from the network is not yet available and else branch is executed. It eventually reads the data but never does cipherIn.flip(); so unlike the the "then" branch which exits with cipherIn(position=0, limit=4629) in my test, the "else" branch exits cipherIn(position=4629, limit=16921) - I guess it is still in "read" mode. That later fails on sslEngine.unwrap(). If I manually invoke cipherIn.flip() in the debugger, the packet seem to decode ok.

@acogoluegnes acogoluegnes self-assigned this Sep 20, 2017
@acogoluegnes acogoluegnes added this to the 4.2.2 milestone Sep 20, 2017
@acogoluegnes
Copy link
Contributor

Thread on mailing list

acogoluegnes added a commit that referenced this issue Sep 20, 2017
@acogoluegnes
Copy link
Contributor

@dimas I cannot reproduce locally, but your analysis looks correct. Can you try the latest 4.2.2 SNAPSHOT? Thanks.

@dimas
Copy link
Contributor Author

dimas commented Sep 20, 2017

It seem to work.
Regarding your PR - isn't it simpler to move cipherIn.flip(); outside of the if? Like this:

                int bytesRead = NioHelper.read(channel, cipherIn);
                if (bytesRead <= 0) {
                    bytesRead = NioHelper.retryRead(channel, cipherIn);
                    if(bytesRead <= 0) {
                        throw new IllegalStateException("Should be reading something from the network");
                    }
                }
                cipherIn.flip();

?
And re the test - my test involves a remote broker, I am not sure it is reproducible with localhost because timing is completely different in that case.
Also, my message is not large as yours - a JSON body of 4224 bytes is being sent + some headers (dunno, probably less than 256 bytes). Maybe for a better coverage it makes sense to run test multiple times with different message sizes.
Cheers

@michaelklishin
Copy link
Contributor

Adding a few more tests with varying sizes is fine. Refactoring suggestions are best presented as pull requests :)

michaelklishin added a commit that referenced this issue Sep 26, 2018
Recovery delay was a connection recovery feature from day 1 for a reason:

 * In practice connection recovery often won't succeed the first time
   because network failures don't always go away in milliseconds.
 * There's a natural race condition between server state changes
   (cleanup of queues and such) and the operations a recovered connection
   will perform potentially on entities *with the same identifier* (e.g. name)

Initial delay avoids a lot of scenarios that stem from the above
race condition and can waste a lot of time for operators, developers
and the RabbitMQ core team.

References #307.
acogoluegnes pushed a commit that referenced this issue Sep 26, 2018
Recovery delay was a connection recovery feature from day 1 for a reason:

 * In practice connection recovery often won't succeed the first time
   because network failures don't always go away in milliseconds.
 * There's a natural race condition between server state changes
   (cleanup of queues and such) and the operations a recovered connection
   will perform potentially on entities *with the same identifier* (e.g. name)

Initial delay avoids a lot of scenarios that stem from the above
race condition and can waste a lot of time for operators, developers
and the RabbitMQ core team.

References #307.

(cherry picked from commit 60eb44d)
acogoluegnes pushed a commit that referenced this issue Sep 26, 2018
Recovery delay was a connection recovery feature from day 1 for a reason:

 * In practice connection recovery often won't succeed the first time
   because network failures don't always go away in milliseconds.
 * There's a natural race condition between server state changes
   (cleanup of queues and such) and the operations a recovered connection
   will perform potentially on entities *with the same identifier* (e.g. name)

Initial delay avoids a lot of scenarios that stem from the above
race condition and can waste a lot of time for operators, developers
and the RabbitMQ core team.

References #307.

(cherry picked from commit 60eb44d)
acogoluegnes pushed a commit that referenced this issue Sep 26, 2018
Recovery delay was a connection recovery feature from day 1 for a reason:

 * In practice connection recovery often won't succeed the first time
   because network failures don't always go away in milliseconds.
 * There's a natural race condition between server state changes
   (cleanup of queues and such) and the operations a recovered connection
   will perform potentially on entities *with the same identifier* (e.g. name)

Initial delay avoids a lot of scenarios that stem from the above
race condition and can waste a lot of time for operators, developers
and the RabbitMQ core team.

References #307.

(cherry picked from commit 60eb44d)
acogoluegnes pushed a commit that referenced this issue Sep 26, 2018
Recovery delay was a connection recovery feature from day 1 for a reason:

 * In practice connection recovery often won't succeed the first time
   because network failures don't always go away in milliseconds.
 * There's a natural race condition between server state changes
   (cleanup of queues and such) and the operations a recovered connection
   will perform potentially on entities *with the same identifier* (e.g. name)

Initial delay avoids a lot of scenarios that stem from the above
race condition and can waste a lot of time for operators, developers
and the RabbitMQ core team.

References #307.

(cherry picked from commit 60eb44d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants