Skip to content

PYTHON-3718 Faster INT2STRING #1221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 5, 2023
Merged

PYTHON-3718 Faster INT2STRING #1221

merged 13 commits into from
Jun 5, 2023

Conversation

thalassemia
Copy link
Contributor

In _cbsonmodule.c, the INT2STRING macro is used heavily during serialization. In my workload (heavily nested documents with large array fields), this macro becomes a performance bottleneck. By swapping the macro for a more purpose-built function like this one adapted from MySQL, my insert_one calls were another 2x faster on top of the 2x speed improvement from #1219 (down to about 25% of initial runtime).

@blink1073
Copy link
Member

Hi @thalassemia, thanks for working on this as well! I am not a lawyer, but my interpretation is that we would need to attribute the code from mysql-server, which is GPL v2.0, and considered incompatible with Apache 2.0 software.

@thalassemia
Copy link
Contributor Author

Thanks for bringing that to my attention @blink1073! I've replaced the code that I took from mysql-server with my own, and (temporarily) included a script that tests edge cases (LLONG_MAX and LLONG_MIN) and benchmarks performance. For commonly seen integers, my code is just as performant as that of mysql-server, and both are more than 2x faster than the current macro.

@blink1073
Copy link
Member

Thanks again! I'd feel safer if we kept a Python test to exercise the new function.

@@ -2935,6 +3030,7 @@ static PyMethodDef _CBSONMethods[] = {
{"_element_to_dict", _cbson_element_to_dict, METH_VARARGS,
"Decode a single key, value pair."},
{"_array_of_documents_to_buffer", _cbson_array_of_documents_to_buffer, METH_VARARGS, "Convert raw array of documents to a stream of BSON documents"},
{"_test_long_long_to_str", _test_long_long_to_str, METH_VARARGS, "Test conversion of extreme and common Py_ssize_t values to str."},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full credit to @1fish2! Are the Evergreen tests failing due to an issue in the PR? Can't see the details.

@blink1073
Copy link
Member

Here's the log, note the use of setup.py build_ext -i instead of pip install -e . like we do in GitHub Actions:

 [2023/05/31 16:49:03.773] + /opt/python/3.10/bin/python3 setup.py build_ext -i
 [2023/05/31 16:49:04.056] INFO: coverage is installed, running tests with coverage...
 [2023/05/31 16:49:04.056] running clean
 [2023/05/31 16:49:04.056] running build_ext
 [2023/05/31 16:49:04.061] building 'bson._cbson' extension
 [2023/05/31 16:49:04.061] creating build
 [2023/05/31 16:49:04.061] creating build/temp.linux-x86_64-cpython-310
 [2023/05/31 16:49:04.062] creating build/temp.linux-x86_64-cpython-310/bson
 [2023/05/31 16:49:04.062] gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Ibson -I/opt/python/3.10/include/python3.10 -c bson/_cbsonmodule.c -o build/temp.linux-x86_64-cpython-310/bson/_cbsonmodule.o
 [2023/05/31 16:49:04.920] gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Ibson -I/opt/python/3.10/include/python3.10 -c bson/buffer.c -o build/temp.linux-x86_64-cpython-310/bson/buffer.o
 [2023/05/31 16:49:05.020] gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Ibson -I/opt/python/3.10/include/python3.10 -c bson/time64.c -o build/temp.linux-x86_64-cpython-310/bson/time64.o
 [2023/05/31 16:49:05.228] creating build/lib.linux-x86_64-cpython-310
 [2023/05/31 16:49:05.228] creating build/lib.linux-x86_64-cpython-310/bson
 [2023/05/31 16:49:05.229] gcc -pthread -shared -Wl,--enable-new-dtags,-rpath,/opt/python/3.10/lib build/temp.linux-x86_64-cpython-310/bson/_cbsonmodule.o build/temp.linux-x86_64-cpython-310/bson/buffer.o build/temp.linux-x86_64-cpython-310/bson/time64.o -L/opt/python/3.10/lib -o build/lib.linux-x86_64-cpython-310/bson/_cbson.cpython-310-x86_64-linux-gnu.so
 [2023/05/31 16:49:05.248] building 'pymongo._cmessage' extension
 [2023/05/31 16:49:05.248] creating build/temp.linux-x86_64-cpython-310/pymongo
 [2023/05/31 16:49:05.248] gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Ibson -I/opt/python/3.10/include/python3.10 -c bson/buffer.c -o build/temp.linux-x86_64-cpython-310/bson/buffer.o
 [2023/05/31 16:49:05.347] gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Ibson -I/opt/python/3.10/include/python3.10 -c pymongo/_cmessagemodule.c -o build/temp.linux-x86_64-cpython-310/pymongo/_cmessagemodule.o
 [2023/05/31 16:49:05.622] creating build/lib.linux-x86_64-cpython-310/pymongo
 [2023/05/31 16:49:05.622] gcc -pthread -shared -Wl,--enable-new-dtags,-rpath,/opt/python/3.10/lib build/temp.linux-x86_64-cpython-310/bson/buffer.o build/temp.linux-x86_64-cpython-310/pymongo/_cmessagemodule.o -L/opt/python/3.10/lib -o build/lib.linux-x86_64-cpython-310/pymongo/_cmessage.cpython-310-x86_64-linux-gnu.so
 [2023/05/31 16:49:05.644] copying build/lib.linux-x86_64-cpython-310/bson/_cbson.cpython-310-x86_64-linux-gnu.so -> bson
 [2023/05/31 16:49:05.645] copying build/lib.linux-x86_64-cpython-310/pymongo/_cmessage.cpython-310-x86_64-linux-gnu.so -> pymongo
 [2023/05/31 16:49:05.693] + /opt/python/3.10/bin/python3 -c 'from bson import _cbson; from pymongo import _cmessage'
 [2023/05/31 16:49:06.038] Traceback (most recent call last):
 [2023/05/31 16:49:06.038]   File "<string>", line 1, in <module>
 [2023/05/31 16:49:06.038] ImportError: /data/mci/a4e2586048652aa5165a75ee3e206dd6/src/pymongo/_cmessage.cpython-310-x86_64-linux-gnu.so: undefined symbol: long_long_to_str
 [2023/05/31 16:49:06.068] Command 'shell.exec' in function 'run tests' failed: shell script encountered problem: exit code 1.
 [2023/05/31 16:49:06.068] Task completed - FAILURE.
 [2023/05/31 16:49:06.187] Running post-task commands.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@blink1073 blink1073 merged commit 1ba4c0b into mongodb:master Jun 5, 2023
@ShaneHarvey
Copy link
Member

Our existing perf benchmarks don't show any improvement for this change because they don't cover encoding large arrays fields. Testing manually shows great results.

Before this change:

$ python -m timeit -s 'from bson import encode;doc={"a":list(range(1000000))}' 'encode(doc)'
5 loops, best of 5: 67.2 msec per loop

After this change:

$ python -m timeit -s 'from bson import encode;doc={"a":list(range(1000000))}' 'encode(doc)'
10 loops, best of 5: 37.7 msec per loop

That's roughly a 2x improvement. Going back further and comparing this to the commit directly before PYTHON-3717:

$ python -m timeit -s 'from bson import encode;doc={"a":list(range(1000000))}' 'encode(doc)'
1 loop, best of 5: 298 msec per loop

So this change combined with PYTHON-3717 yields up to an 8x improvement in bson encoding. Incredible! Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants