-
Notifications
You must be signed in to change notification settings - Fork 466
Recreate channel after reconnect. Message confirmation feature #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tream # Conflicts: # RabbitMq/BaseAmqp.php # composer.json
tests are failing |
RabbitMq/BaseAmqp.php
Outdated
* see https://github.com/php-amqplib/php-amqplib/commit/2ccc97ca5b1229f9b12ea47fbab6c16fad26df41 | ||
* see https://github.com/php-amqplib/php-amqplib/commit/c87469ecbbf38fdc18688d3216c1e253d640ba32 | ||
*/ | ||
$this->conn->reconnect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.6.3 is released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need time to recheck why it was done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry not fixed yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleg-andreyev 2.6.3 and even 2.7.0-rc1 doesn't help :(
Tests/RabbitMq/ProducerTest.php
Outdated
/* | ||
* This file is part of the OpCart software. | ||
* | ||
* (c) 2015, OpticsPlanet, Inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use proper header
Tests/RabbitMq/ProducerTest.php
Outdated
* file that was distributed with this source code. | ||
*/ | ||
|
||
namespace OldSound\RabbitMqBundle\Tests\RabbitMq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
composer.json
Outdated
@@ -31,7 +31,8 @@ | |||
}, | |||
"extra": { | |||
"branch-alias": { | |||
"dev-master": "1.10.x-dev" | |||
"dev-master": "1.10.x-dev", | |||
"dev-reconnect-fix-upstream": "1.13.x-dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was done for our needs. Have to be removed.
Can you explain a bit what issue are you fixing? Use case, what exception/behaviour did you encountered, under which settings.. |
I will describe my complete investigation and describe in which cases it
may help.
|
@skafandri take a look on description |
This is interesting. Can you please verify your notes about php-amqplib 2.6.3 and add a new line at the end of |
@skafandri I have update code with accordance of php-amqplib version 2.6.3. I have added version constraint into compose.json. It will be good to add some integration tests using real RabbitMQ instance like php-amqplib have to cover reconnect and confirmation features. Also it will be good to add my exaplanation how to handle server outages and missed messages into documentation of this bundle, because it is not simple things and developers must be aware. |
@oleg-andreyev please take a look on PR |
@skafandri what we have to do to merge this PR? |
RabbitMq/BaseAmqp.php
Outdated
$this->ch = null; | ||
} catch (\Exception $e) { | ||
// ignore exception on Channel object destructor | ||
// TODO: this workaround can be removed after php-amqplib will be updated till 2.6.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could adjust composer.json
to force version of the lib too: ^2.6.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stloyd removed
*/ | ||
public function setWaitConfirmationTimeout($timeout) | ||
{ | ||
if (!is_int($timeout) || $timeout < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!$timeout || !is_int($timeout)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stloyd not sure why this condition need to be changed. Please explain.
I have added few test cases for this this method test. By changing this condition, as you propose, it lead to two test cases fail:
- When timeout is zero
- Timeout is less then zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stloyd ping
composer.json
Outdated
@@ -15,7 +15,7 @@ | |||
"symfony/config": "~2.3 || ~3.0", | |||
"symfony/yaml": "~2.3 || ~3.0", | |||
"symfony/console": "~2.3 || ~3.0", | |||
"php-amqplib/php-amqplib": "~2.6", | |||
"php-amqplib/php-amqplib": "~2.6 >=2.6.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, should be ^2.6.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stloyd fixed
@torinaki @stloyd |
@stloyd, @skafandri we have started migration from videlalvaro/php-amqplib to version 2.6.3 of php-amqplib/php-amqplib and noticed that reconnect still doesn't work My initial fix has addresed this issue in previous commit: Currenlty we are investigating with @chernecov what must be done to make it work. |
@@ -15,7 +15,7 @@ | |||
"symfony/config": "~2.3 || ~3.0", | |||
"symfony/yaml": "~2.3 || ~3.0", | |||
"symfony/console": "~2.3 || ~3.0", | |||
"php-amqplib/php-amqplib": "~2.6", | |||
"php-amqplib/php-amqplib": "^2.6.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconnect fix doesn't work even in 2.7
Note that originaly I have assumed that following PRs must be merged, but it is still not enought.
php-amqplib/php-amqplib@2ccc97c
php-amqplib/php-amqplib@c87469e
@stloyd, @skafandri can you please review our conclusions? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This PR was closed because it has been stalled for 10 days with no activity. |
There no many changes. I can update the PR and test with latest version. |
Fixed issue with missed channels after reconnect
Long running producer job may lead to following exception:
This exception appears right after job finish.
It happen in case when during job is sending messages catch some network issues and tried to reconnect. Use use reconnect like self-healing feature instead of just start job from scratch, because multiple messages may already consumed or/and restarting job may took few hours and it is inappropriate. Current reconnect implementation do not recreate channels. It leads to issue that messages are sending to no where and are not fall into queue. Also you will get above mentioned exception right after job finish process messages, because we can't close channels on destruct that do not exists.
We can simulate this case:
AR: Job will continue to send messages, but messages will not fall into queue and will return exception at the end
ER: Queue size will contine grow
Implemented message confirmantion feature enabling for channels
Your RabbitMQ server may block connections. For example when qeueue is out of memory. If you use default producer settings it will lead that all your messages will be dropped. The main issue is that right after connection was blocked Rabbit isn't able to accept any messages, but publisher is still can publish and catch connection exception only after some time. So some messages will be dropped. Publisher job will sends messages until you will get followinge exception
According to documentation rabbit sends notification to all clients that server supports connection blocking and you must be ready:
https://www.rabbitmq.com/connection-blocked.html
To catch this situation we must use special method:
https://github.com/php-amqplib/php-amqplib/blob/2bc16b9f0859357a26e6b253c4552b61071dfdd9/PhpAmqpLib/Helper/Protocol/Protocol091.php#L152
according to php-amqplib/php-amqplib and php-amqplib/RabbitMqBundle this method never used. So blocked connection case isn't covered by bundle and library.
Unfortunatly Rabbit by default do not garantee that some of messages will be delivered if you have network problem or Rabbit have blocked connections because of hardware resources issues. The only way how to garantee it is to use transations. But transactions are unnecessarily heavyweight and decrease throughput by a factor of 250. To remedy this, a confirmation mechanism was introduced. Confirmation is more lightweight.
Unfortunalty PHP isn't async and such message confirmation will lead to some overhead. But currently it is a only way how to garantee message delivery. Confirmation garantee that not only connection exists, but also channel and exchange that will consume message.
To reproduce this issue:
ER: job must catch exception and reconnect w/o lost any messages
Please check messages count in queue
Сonclusions
To properly catch blocking connection reconnect must work properly and you must use message confirmation.
When connection is blocked it may happen that we send some messages twice and duplicates messages may appear in queue, because we can't confirm that last message before blocking exception appears isn't sent and we must send it once again.
This case may affect only small amount of messages, because in cases when Rabbit blocks connection it means that Rabbit not accepts any new connection as well so reconnect method will hung untill Rabbit outages will be solved. So we asume that only one duplicate message will be send to one customer in case when connection has been blocked.
I didn't found how this case may be handled.