-
Notifications
You must be signed in to change notification settings - Fork 266
PHPLIB-1237 Implement Parallel Multi File Export Bench #1169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We can see that forking is faster that using a single process or starting worker processes. Thanks to the benchmark, we can see that there's a balance to be found to get the best number of threads.
|
// We don't use MongoDB\BSON\Document::toCanonicalExtendedJSON() because | ||
// it is slower than json_encode() on an array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that json_encode
does not produce the same result as toCanonicalExtendedJSON
. That said, it does produce the result we wanted.
I also had to do a double-take on this, so I wrote a small benchmark:
benchToJSONViaToCanonicalExtendedJSON...I2 - Mo48.200μs (±0.00%)
benchToJSONViaToRelaxedExtendedJSON.....I2 - Mo48.591μs (±6.09%)
benchToJSONViaJsonEncode................I2 - Mo12.410μs (±3.55%)
The last benchmark actually calls json_encode($document->toPHP(['root' => 'array']))
, so I'm a bit surprised that is faster. This might provide an opportunity to revisit the JSON serialisation logic in libbson.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in PHPC-2299
346c198
to
033f317
Compare
// Reset to ensure that the existing libmongoc client (via the Manager) is not re-used by the child | ||
// process. When the child process constructs a new Manager, the differing PID will result in creation | ||
// of a new libmongoc client. | ||
Utils::reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there existing state prior to benchFork()
where a Manager would have been reigstered? Perhaps from beforeIteration()
dropping and creating a collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the reset, I get this error:
Fatal error: Uncaught MongoDB\Driver\Exception\BulkWriteException: assertion src/mongo/util/hex.cpp:113 in /Users/jerome/Develop/mongo-php-library/benchmark/src/DriverBench/ParallelMultiFileImportBench.php:211
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly this: https://github.com/mongodb/mongo/blob/r7.0.1/src/mongo/util/hex.cpp#L113
Seems odd that the server would throw back a cryptic assertion failure as its error message, though. If it's easily reproducible this might warrant a bug report in the SERVER project.
Fix PHPLIB-1237
Follows #1166
Parallel Benchmarks specs: LDJSON multi-file export
Upgrade to AMPHP with Fibers.
Implementations:
amphp/parallel
with worker poolThe scenario have 3 critical steps that I tried to optimize:
Find documents
I grouped files by chunks to perform a single query for several files and reduce server round trips.
I had to use
limit
/skip
to paginate on results. I tried an aggregation pipeline to get the first ID of each chunk, but it takes ~600ms, which is too significant with no benefit on the query.Aggregation pipeline to get the first ID of each chunk
Encoding documents to JSON
Initially I used the methods of
MongoDB\BSON\Document
:toCanonicalExtendedJSON
andtoRelaxedExtendedJSON
but this methods are slow compared tojson_encode
. And they don't return the expected document format.Even if the
bson
typeMap is faster for the query, I had to usearray
typeMap to convert to JSON.json_encode($document->toPHP())
is slower than encoding an array.Writing the file
I tested several approaches for writing the file. One
fwrite
per document was 1.1x slower than afile_put_contents
of the whole file. I didn't see any benefit of doing async write usingstream_set_blocking
. Tried AMP'sWritableStream
but it was much slower. And in any case, it's not what takes the longest overall.