Description
We recently noticed that a heavy JMESpath workload was triggering a large number of garbage collection runs. We are using jmespath.compile()
, and we tracked this down to the jmespath.visitor.TreeInterpreter
that is created on every call to `ParsedResult.search():
jmespath.py/jmespath/parser.py
Line 508 in bbe7300
It appears that TreeInterpreter
creates a reference cycle, which leads to the GC being triggered frequently to clean up the cycles. As far as I can tell, the problem comes from the Visitor._method_cache
:
jmespath.py/jmespath/visitor.py
Lines 91 to 93 in bbe7300
...which store references to methods that are bound to self
in a member of self
.
Possible solution
We worked around the problem by monkey patching ParsedResult
so that it (1) caches a default_interpreter
for use when options=None
, and (2) uses it in search()
. If I understand correctly, we could go further and use a global TreeInterpreter
for all ParsedResult
instances. The TreeInterpreter
seems to be stateless apart from self._method_cache
and that implementation seems to be thread-safe (with only the risk of multiple lookups for the same method in a multithreaded case).
I'd be happy to contribute a PR for either version if this would be welcome.
How to reproduce
The following reproducer shows the problem:
import jmespath
import gc
gc.set_debug(gc.DEBUG_COLLECTABLE)
pattern = jmespath.compile("foo")
value = {"foo": "bar"}
for _ in range(1000000):
pattern.search(value)
...where the output contains one million repetitions of something like:
gc: collectable <TreeInterpreter 0x10f634fa0>
gc: collectable <dict 0x10f63e780>
gc: collectable <Options 0x10f634520>
gc: collectable <Functions 0x10f6345b0>
gc: collectable <method 0x10f63ee80>
gc: collectable <dict 0x10f63eb00>