-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-3718 Faster INT2STRING #1221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @thalassemia, thanks for working on this as well! I am not a lawyer, but my interpretation is that we would need to attribute the code from |
Thanks for bringing that to my attention @blink1073! I've replaced the code that I took from |
Thanks again! I'd feel safer if we kept a Python test to exercise the new function. |
@@ -2935,6 +3030,7 @@ static PyMethodDef _CBSONMethods[] = { | |||
{"_element_to_dict", _cbson_element_to_dict, METH_VARARGS, | |||
"Decode a single key, value pair."}, | |||
{"_array_of_documents_to_buffer", _cbson_array_of_documents_to_buffer, METH_VARARGS, "Convert raw array of documents to a stream of BSON documents"}, | |||
{"_test_long_long_to_str", _test_long_long_to_str, METH_VARARGS, "Test conversion of extreme and common Py_ssize_t values to str."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full credit to @1fish2! Are the Evergreen tests failing due to an issue in the PR? Can't see the details.
Here's the log, note the use of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
Our existing perf benchmarks don't show any improvement for this change because they don't cover encoding large arrays fields. Testing manually shows great results. Before this change:
After this change:
That's roughly a 2x improvement. Going back further and comparing this to the commit directly before PYTHON-3717:
So this change combined with PYTHON-3717 yields up to an 8x improvement in bson encoding. Incredible! Thanks again! |
In _cbsonmodule.c, the
INT2STRING
macro is used heavily during serialization. In my workload (heavily nested documents with large array fields), this macro becomes a performance bottleneck. By swapping the macro for a more purpose-built function like this one adapted from MySQL, myinsert_one
calls were another 2x faster on top of the 2x speed improvement from #1219 (down to about 25% of initial runtime).