Skip to content

PYTHON-3717 Speed up _type_marker check in BSON #1219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 26, 2023

Conversation

thalassemia
Copy link
Contributor

In _cbsonmodule.c, the _type_marker function uses PyObject_HasAttrString(object, "_type_marker") and PyObject_GetAttrString(object, "_type_marker"). In my workloads (highly nested documents with many large array fields), these functions become severe bottlenecks to performance, because they each create new Python string objects by calling PyUnicode_FromString("_type_marker") internally every time they run.

This simple change improved my performance by more than double. One caveat is that this leaks the TYPEMARKERSTR object in the case that the cbson module is unloaded.

Also, correct me if I'm wrong, but I believe these lines are redundant because the function returns type at the end regardless.

@blink1073
Copy link
Member

blink1073 commented May 24, 2023

Hi @thalassemia, thanks for working on this! We should be able to add this value to the module_state, and then clear it on teardown. See here for an example of interacting with the module state.

Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you!

@blink1073 blink1073 merged commit 4c0196d into mongodb:master May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants