-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Three stream read optimizations #8032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Azure CI failures unrelated to this PR. The other unit test failures are ... debatable.
The PHP core attempts to disable buffering (set_option) and has an additional "eof" check to avoid an unnecessary About
after:
With my change, PHP circumvents the read buffer if it is empty and a read was requested of more than the buffer size. Using the buffer in such a case, when reading a big portion of a file with one call, results in unnecessary copying memory around (from stream to read buffer, from read buffer to return value); the buffer also limits each read to a 8 kB chunk at most. This unit test expects PHP to not do that. Opinions here? If desired, I can amend this PR to also include the necessary unit test changes. I can also try to figure out how to do these optimizations only for "plain file" streams, where the benefit is very noticable. |
do you have any numbers, of how much faster this should get with this change? |
Here you have excerpts from an strace which loads
This finishes after 4.3ms.
This is the full strace, not abbreviated, and it finishes after 0.39ms - that's a speedup factor of 10. This is, of course, a somewhat degenerate microbenchmark, and the numbers are not exactly reliable (only one iteration and I have recorded only wallclock time, not CPU time), but I think it's good enough to demonstrate my point. I think it is obvious enough that reducing 90 This was on a machine without Spectre/Meltdown mitigations. With those in place, the system call overhead will be even larger, increasing the gap between existing PHP code and my PR code. |
I did another benchmark, PHP source code:
Before this PR:
After this PR:
You can clearly see that the existing PHP code wastes most of the CPU time copying around data from the stream read buffer to the final string, and that disappears completely with my improvements. What remains is mostly copying the file data from the kernel's page cache to userspace. |
The failing stream_read_object_return.phpt exhibits an BC break. :( |
What is a BC break? Is it caused by my change or just revealed by it? |
"BC break" == "backwards compatibility break". The mentioned test now fails, because unimplemented streamWrapper methods would now be called. That might be an issue (and might be fixable), or not. In any way, we're likely better off to turn the streamWrapper class into a set of interfaces in the long run. |
719539f
to
7879353
Compare
I have added a About the Existing (old) code: php-src/main/streams/streams.c Lines 1477 to 1480 in d67698a
... and the unit test fails because it does not expect the eof method to be called.
|
I added another patch which gives another 14% microbenchmark speedup by eliminating the pointless empty |
@@ -1517,7 +1517,7 @@ PHPAPI zend_string *_php_stream_copy_to_mem(php_stream *src, size_t maxlen, int | |||
ptr = ZSTR_VAL(result); | |||
|
|||
// TODO: Propagate error? | |||
while ((ret = php_stream_read(src, ptr, max_len - len)) > 0){ | |||
while (!php_stream_eof(src) && (ret = php_stream_read(src, ptr, max_len - len)) > 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, php_stream_eof() also generates some system calls (poll(), recv() with MSG_PEEK flag).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if the eof
flag has not yet been set:
php-src/main/streams/streams.c
Lines 789 to 791 in 05023a2
if (!stream->eof && PHP_STREAM_OPTION_RETURN_ERR == | |
php_stream_set_option(stream, PHP_STREAM_OPTION_CHECK_LIVENESS, | |
0, NULL)) { |
But you are right, PHP's "xp_streams" implementation is rather clumsy. The way it is implemented, my change adds one system call for streams (replacing
recv()
with poll()
+recv()
); it still eliminates one extra read()
system call for regular files!The xp_streams code could be optimized easily. Anybody interested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two system calls, when it's not at the EOF, we got poll() + recv() + recv()
, and CHECK_LIVENESS call usually returns true.
php_stream is not only designed for files, it also works for sockets.
The xp_streams code could be optimized easily. Anybody interested?
How? I can not understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two system calls, when it's not at the EOF, we got
poll() + recv() + recv()
Oh yes, you are right. I thought about the "eof" case, but forgot about the "not-eof".
How? I can not understand.
By omitting the poll()
on operating systems where MSG_DONTWAIT
is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always assume that doing so might cause a BC break.
In addition, this can also be applied when the stream is in non-blocking mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always assume that doing so might cause a BC break.
How?
In addition, this can also be applied when the stream is in non-blocking mode.
You mean on operating systems which don't support MSG_DONTWAIT
? (i.e. Windows)
This particular optimization does work for some particular cases, but it seems like we don't have benchmark scripts to show its impact on other common scripts. |
My benchmark is number of system calls (which in the end, always boils down to faster execution). Of course, you are right, the number of system calls in other ways to use this API is of interest as well. How can I contribute benchmarks to demonstrate the effectiveness of my optimizations? Are there existing examples of how these shall be integrated in php-src?
I don't like complexity either, but adding some slight well-reviewed complexity at a central place like the PHP language implementation to get performance advantages is well worth it. That's why PHP has complexity monsters like a JIT these days - the horrendous complexity pays off. My optimizations add only very little complexity, with big benefits.
Maybe I should attack this from a different angle: the stream implementation for regular files should avoid the |
would it make sense / be possible to separate the optimizatoin into a path which is only taken for local files and leave the path for non-file types as is? |
@twose, then what do you think of this piece of code? php-src/main/streams/streams.c Lines 1480 to 1481 in 05023a2
This is current git master. My patch just copied what was there in the same function, in a different code path. |
Possible, yes. I could easily add another Though - as I just replied to @twose - my |
@twose, more of the same from git master: php-src/ext/dba/libflatfile/flatfile.c Lines 110 to 112 in 05023a2
php-src/ext/dba/libflatfile/flatfile.c Lines 161 to 162 in 05023a2
php-src/ext/dba/libflatfile/flatfile.c Lines 201 to 202 in 05023a2
Line 1006 in 05023a2
Line 1073 in 05023a2
Line 2242 in 05023a2
Lines 1652 to 1653 in 05023a2
Line 2345 in 05023a2
Lines 366 to 368 in 05023a2
The php_stream_eof() + php_stream_read() appears to be a very common thing in the PHP code base. And many of these examples are unnecessary bloat.
|
Yes. And, in my opinion, php_stream doesn't have a very strong focus on performance. And you're right, it does have a lot to optimize for. But, I'm also not sure if it would cause any other problems (e.g. BC break) if we changed the code. So I always tend to focus on new code instead of the code that already exists. It's too difficult for me to find the reasons for the existence of a historical code... I'm just focused on network programming but I'm not familiar enough with php_stream implementation... |
It's used everywhere both by PHP itself and by PHP applications, and its unnecessary overhead is very noticable. On my setup, a bunch of micro-improvements made a huge difference. We agree very much that |
@twose I added a patch which optimizes |
The read buffer is useless here, it only hurts performance.
This eliminates the last redundant read() system call in every file_get_contents() call.
If the read buffer is currently empty, don't buffer large reads (larger than chunk_size), because buffering would only hurt performance in such an edge case.
If we know the exact file size already, we can allocate the right buffer size and issue only one read() system call. This eliminates unnecessary buffer allocations (file size plus 8 kB), eliminates buffer reallocations via zend_string_truncate(), and eliminates the last read() system call. Benchmark: echo Hello World >/tmp/hello.txt <?php for ($i = 1; $i <= 1000000; $i++) file_get_contents('/tmp/hello.txt'); Previously: openat(AT_FDCWD, "/tmp/hello.txt", O_RDONLY) = 3 newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=12, ...}, AT_EMPTY_PATH) = 0 lseek(3, 0, SEEK_CUR) = 0 newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=12, ...}, AT_EMPTY_PATH) = 0 read(3, "Hello World\n", 8204) = 12 read(3, "", 8192) = 0 close(3) = 0 4,036.55 msec task-clock:u # 0.995 CPUs utilized 4,683,289,286 cycles:u # 1.160 GHz 7,954,269,313 instructions:u # 1.70 insn per cycle 1,165,461,174 branches:u # 288.727 M/sec 1,431,759 branch-misses:u # 0.12% of all branches Optimized: openat(AT_FDCWD, "/tmp/hello.txt", O_RDONLY) = 3 newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=12, ...}, AT_EMPTY_PATH) = 0 lseek(3, 0, SEEK_CUR) = 0 newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=12, ...}, AT_EMPTY_PATH) = 0 read(3, "Hello World\n", 12) = 12 close(3) = 0 3,473.51 msec task-clock:u # 0.997 CPUs utilized 4,132,407,666 cycles:u # 1.190 GHz 6,841,273,246 instructions:u # 1.66 insn per cycle 1,024,464,752 branches:u # 294.936 M/sec 1,478,403 branch-misses:u # 0.14% of all branches For this micro-benchmark, the improvement is 14%. This patch needs to adjust the unit test "bug54946.phpt" because the EBADF warning is no longer logged.
2c80b2e
to
bf75865
Compare
I have rebased this PR branch, but of course, the unit test failures are still there (they fail because they rely on implementation details). |
I don't necessarily think that there isn't an interest to merge this but rather that nobody with the required knowledge has had time to review it yet. |
If there's interest, contact me and I'll resubmit this PR. Meanwhile, you can check #8092 which is a related micro-optimization, but doesn't break any (fragile) unit test. |
@@ -566,6 +566,11 @@ PHP_FUNCTION(file_get_contents) | |||
RETURN_FALSE; | |||
} | |||
|
|||
/* disabling the read buffer allows doing the whole transfer | |||
in just one read() system call */ | |||
if (php_stream_is(stream, PHP_STREAM_IS_STDIO)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxKellermann I don't quite understand why this pull request got closed in it's entirety. I know there is an issue around commits 2+3, but commit 1 (file_get_contents without read buffer) is a clear, simple and non-breaking and uncontroversial improvement IMO. Maybe commit #e45f829 can be extracted into a new pull request to get merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK here you are #8547
These patches optimize large reads by skipping the buffer when appropriate. This saves lots of
read()
system calls and reduces unnecessary memory copies.