Skip to content

PHPLIB-1235 Implements Single-Doc Benchmarks #1161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 18, 2023

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Sep 14, 2023

Fix PHPLIB-1235

https://github.com/mongodb/specifications/blob/master/source/benchmarking/benchmarking.rst#single-doc-benchmarks

Result:

\MongoDB\Benchmark\DriverBench\SingleDocBench

    benchRunCommand.........................I2 - Mo693.331ms (±1.44%)
    benchFindOneById # default..............I2 - Mo1.097s (±6.82%)
    benchFindOneById # bson typemap.........I2 - Mo968.745ms (±0.45%)
    benchInsertOne # Small doc..............I2 - Mo2.812m (±1.93%)
    benchInsertOne # Small BSON doc.........I2 - Mo2.928m (±0.58%)
    benchInsertOne # Large doc..............I2 - Mo632.107ms (±4.04%)
    benchInsertOne # Large BSON doc.........I2 - Mo709.506ms (±4.88%)

@GromNaN GromNaN requested review from jmikola and alcaeus September 14, 2023 11:35
Copy link
Member Author

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spec, Measurement is supposed to output MB/s, I'm not sure this is something we can do with PHPBench, but I can explore reporting features.

Copy link
Member

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short note on the iterations: multiple iterations can be run to ensure that the benchmarks are consistent. For example, we could let phpbench fail the benchmark suite if any benchmark has a deviation of more than x%. We currently don't do this, but will eventually add such assertions in CI. Therefore, I would recommend not hardcoding any iteration number in the benchmarks, instead relying on what's configured in phpbench.json.dist and phpbench.json.

@GromNaN
Copy link
Member Author

GromNaN commented Sep 14, 2023

Running this new benchmark takes 35 minutes on my laptop.

time ./vendor/bin/phpbench run --filter SingleDocBench
PHPBench (1.2.14) running benchmarks... #standwithukraine
with configuration file: /Users/jerome/Develop/mongo-php-library/phpbench.json.dist
with PHP version 8.2.9, xdebug ✔, opcache ❌

\MongoDB\Benchmark\DriverBench\SingleDocBench

    benchRunCommand.........................I2 - Mo693.331ms (±1.44%)
    benchFindOneById # default..............I2 - Mo1.097s (±6.82%)
    benchFindOneById # bson typemap.........I2 - Mo968.745ms (±0.45%)
    benchInsertOne # Small doc..............I2 - Mo2.812m (±1.93%)
    benchInsertOne # Small BSON doc.........I2 - Mo2.928m (±0.58%)
    benchInsertOne # Large doc..............I2 - Mo632.107ms (±4.04%)
    benchInsertOne # Large BSON doc.........I2 - Mo709.506ms (±4.88%)

Subjects: 3, Assertions: 0, Failures: 0, Errors: 0
./vendor/bin/phpbench run --filter SingleDocBench  19.94s user 7.76s system 1% cpu 35:20.41 total

@GromNaN GromNaN requested a review from alcaeus September 14, 2023 19:29
yield 'Small doc WC 0' => [
'document' => self::readJsonFile(self::SMALL_FILE_PATH),
'repeat' => 10_000,
'options' => ['writeConcern' => new WriteConcern(0)],
Copy link
Member Author

@GromNaN GromNaN Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that if my mongo is a replica and the write operation is done with the default writeConcern: majority, the insertOne operation is is slow. If I set writeConcern: {w: 0}, this is very faster. And best of all if I use a single server it is almost instantaneous.

With replica

benchInsertOne # Small doc..............I2 - Mo3.014m (±1.60%)
benchInsertOne # Small BSON doc.........I2 - Mo3.194m (±1.51%)
benchInsertOne # Small doc WC 0.........I2 - Mo20.912s (±9.49%)
benchInsertOne # Large doc..............I2 - Mo648.394ms (±2.47%)
benchInsertOne # Large BSON doc.........I2 - Mo624.408ms (±1.23%)
benchInsertOne # Large doc WC 0.........I2 - Mo152.876ms (±1.91%)

With single server

benchInsertOne # Small doc..............I2 - Mo919.939ms (±2.23%)
benchInsertOne # Small BSON doc.........I2 - Mo931.763ms (±2.56%)
benchInsertOne # Small doc WC 0.........I2 - Mo259.724ms (±1.03%)
benchInsertOne # Large doc..............I2 - Mo310.960ms (±5.61%)
benchInsertOne # Large BSON doc.........I2 - Mo304.353ms (±2.49%)
benchInsertOne # Large doc WC 0.........I2 - Mo147.833ms (±10.32%)

In Github Action (single server)

subject # set revs its mem_peak mode rstdev
benchInsertOne # Small doc 1 3 1.566mb 1.949s ± 0.60%
benchInsertOne # Small BSON doc 1 3 1.566mb 1.962s ± 1.00%
benchInsertOne # Small doc WC 0 1 3 1.567mb 255.210ms ± 1.19%
benchInsertOne # Large doc 1 3 19.796mb 395.377ms ± 0.47%
benchInsertOne # Large BSON doc 1 3 14.410mb 393.043ms ± 0.13%
benchInsertOne # Large doc WC 0 1 3 14.410mb 247.049ms ± 1.02%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteConcern 0, aka "unacknowledged writes" is expected to be fast, as the driver sends the command to the server but never waits to hear back. This means that there are no guarantees that the data has actually been written. The default write concern on a replica set is "majority", which equals 2 in the default 3-member configuration.

Since the specification does not specify what configurations to run this on, I would suggest running benchmarks on a standalone with the default write concern only.

@alcaeus
Copy link
Member

alcaeus commented Sep 15, 2023

Running this new benchmark takes 35 minutes on my laptop.

The main cause for this seems to be the "Small doc" benchmark:

    benchInsertOne # Small doc..............I2 - Mo2.812m (±1.93%)
    benchInsertOne # Small BSON doc.........I2 - Mo2.928m (±0.58%)

I would suggest dropping the second data set, as decoding to raw BSON doesn't yield a significantly different result from the original, as network overhead is likely a big factor in the timing here. At the same time, limiting this test to a single iteration would make sense given the runtime. That would reduce the time to ~3 minutes, which is much more tolerable.

However, given these results I'd still like to check with other drivers to see how long this process takes in their drivers and whether we should expect this kind of time.

@GromNaN GromNaN merged commit a187452 into mongodb:master Sep 18, 2023
@GromNaN GromNaN deleted the PHPLIB-1235 branch September 18, 2023 09:00
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN: Apologies. I realized I had some pending comments that were never published. Nothing major, so feel free to ignore or address in a separate commit as you like.

$collection = self::getCollection();

for ($i = $params['repeat']; $i > 0; $i--) {
$collection->insertOne($params['document'], $params['options'] ?? []);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that none of the data providers specify options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it before the merge. Good catch.

$database = self::getDatabase();

for ($i = 0; $i < 10_000; $i++) {
$manager->executeCommand($database, new Command(['hello' => true]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$manager->executeCommand($database, new Command(['hello' => true]));
$manager->executeCommand($database, new Command(['hello' => 1]));

The key doesn't matter for these commands, but we typically use 1 as a value for things like hello and ping.

I also agree with @alcaeus earlier comment that you can probably move Command construction out of the benchmark if you want to remove more unrelated overhead. Either a data provider or a class property might work.

Copy link
Member Author

@GromNaN GromNaN Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec orders to use true explicitly.

For the command object creation. I think it's part of the driver task to send a command. I ça add an additional bench with excluded command instantiation from the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants