-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Block signals during fpm master initialization #4471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix PHP-FPM failure in the case of concurrent reload attempts. Postpone signal delivery to the fpm master process till proper signal handlers are set. Prevent the following case: - Running master process receives `SIGUSR2` and performs `execvp()`. - Another `SIGUSR2` is arrived before signal handlers are set. - Master process dies. - Requests to the HTTP server handled by PHP-fpm can not be served any more. Block some signals using `sigprocmask()` before `execvp()` and early in the `main()` function. Unblock signals as soon as proper handlers are set. Fixes bug #74083
I have a patch for Bug #76601 but it adds more code to same places so independent pull request will conflict with this one. That is why I am expecting review of this change at first. |
As I said in the bug report, this one needs a test. I have got actually some changes in fpmi (sort of my testing version of fpm) for that so it's possible to enable automatically opcache in FPMi tests. Essentially these are the changes in the tester (might be missing some later changes maybe): and this is the test: I will need to port it to FPM and do a bit more testing. I think it looks good in general but need to think about it a bit more. In the worst case we might just merge it to 7.4 and see what happens... :) |
This particular issue #74083 has nothing common with The full story is the following. We were receiving complains concerning PHP-FPM crashes during reload and so web-sites downtime but the root cause was unclear. Concurrent reload #74083 was the most obvious problem. Before this patch was ready, I faced #76601 that is even more weird (the state could be unnoticed by watchdogs) and has no common with In #76601 thread on bugs.php.net I posted my unsuccessful attempts to write a test for the bug #76895. I could not figure out that syntax error problem is tightly bound to I have tried to add a test, but race condition tests usually require some non-negligible time to achieve high enough probability of failure and depend on performance of an executor. People usually complain that they already have enough slow tests. From my point of view this pull request should be merged after #4465 or #4460. Additional patch against #76601 is preferable but not strictly required. |
Look e.g. at fpm_events.c#L440 I suspect that such workarounds are required due to lack of proper signal blocking during some operations. This patch is the first part to fix such cases. |
Have not expected side effects of `include`.
echo "Reached interval $interval us with $step us steps\n"; | ||
$tester->expectLogNotice('Reloading in progress ...'); | ||
/* Consume mix of 'Reloading in progress ...' and 'reloading: .*' */ | ||
$tester->getLogLines(2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test spends 2 seconds here due to timeout. Test will be unreliable with exact number of lines since number of actual reloads may vary. Reload may be interrupted issued only a part of logs. More careful analysis of output requires reimplementation of functions like expectLogStartNotices()
. That is why extra delay could hardly be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I will try to find some time to take a look on this so we can remove the slow test flag...
@bukka could you take care of this and the related requests please ? |
@@ -109,6 +109,10 @@ int __riscosify_control = __RISCOSIFY_STRICT_UNIX_SPECS; | |||
#include "fpm_log.h" | |||
#include "zlog.h" | |||
|
|||
#if HAVE_SIGNAL_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that HAVE_SIGNAL_H
has been removed in 7.4 5f89157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the rebase...
This pull request requires #4465 (merged to 7.4 only yet) or additional changes in children spawning that use functions introduced here. |
I tried to run the test in FPMi: bukka/fpmi@9d02a13 But getting this:
This fix should be in it but might be because I'm missing some other fixes. Will probably need to look a bit more and port other stuff as well or will just try it in FPM. |
@plmnikulin just to double check if the test is working fine for you in 7.4? |
Sorry, I am quite busy these days.
Please, provide more details on your environment (maybe backtrace from coredump could be useful) if you suspect that the problem is not completely fixed. I have not checked if you have a patch fixing SIGKILL for children in the fpmi project. |
@plmnikulin I made it work and the tests passes when patch is applied and fails when it's not which is good. I merged to PHP 7.4 only for now so it gets in at least. If it's all good after 7.4 gets some usage, we might consider back port to 7.3 |
Promised patch against Bug #76601: #4836 |
Fix PHP-FPM failure in the case of concurrent reload attempts.
Postpone signal delivery to the fpm master process till proper signal
handlers are set. Prevent the following case:
SIGUSR2
and performsexecvp()
.SIGUSR2
is arrived before signal handlers are set.any more.
Block some signals using
sigprocmask()
beforeexecvp()
and earlyin the
main()
function. Unblock signals as soon as properhandlers are set.
Fixes bug #74083