Skip to content

Replace SimpleThrottle self.history default value with deque #8074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

JasperSui
Copy link

Description

To decrease the time complexity from O(N) to O(1) while adding an element at the first position of the list (self.history), we should use collections.deque with .appendleft() for 10x performance.

Demo:

In [1]: lst = [0]*1000
In [2]: timeit -n1000 lst.insert(0, 1)
1000 loops, best of 3: 794 ns per loop

In [3]: from collections import deque
In [4]: deq = deque([0]*1000)
In [5]: timeit -n1000 deq.appendleft(1)
1000 loops, best of 3: 73 ns per loop

@@ -120,7 +121,7 @@ def allow_request(self, request, view):
if self.key is None:
return True

self.history = self.cache.get(self.key, [])
self.history = self.cache.get(self.key, deque())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will constructing a deque() only to be thrown away when hitting the cache be a problem?

Copy link
Author

@JasperSui JasperSui Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @akx,

Will constructing a deque() only to be thrown away when hitting the cache be a problem?

No, the initialization way is the same as the empty list version. As you can see, there is no other way to initialize self.history but Line 124. What we only modify is the way to maintain a FILO array (self.history) by using collections.deque which is the most suitable for the scenario.

Since we change the data structure of SimpleThrottleRate that might be the most common throttle class, we must clarify it in the changelog to avoid the upgrade defect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was mostly that is it e.g. slower to initialize a deque than an empty list? Should this explicitly be

Suggested change
self.history = self.cache.get(self.key, deque())
self.history = self.cache.get(self.key)
if self.history is None:
self.history = deque()

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just check the cpython/listobject.c and cpython/Modules/_collectionsmodule.c, and django-redis/cache.py since django/core/cache/backend/base.py wants us to implement .get() ourselves.

The initialization performance difference between list and deque is close I think, but seems like we will declare a deque() in every allow_request() no matter the key exists in cache or not, so I think the explicit way like yours will be better than the original version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple benchmark:

In [1]: timeit.timeit("[]", number=10000)
Out[1]: 0.00045701999999891996

In [2]: timeit.timeit("deque()", setup="from collections import deque", number=10000)
Out[2]: 0.000674222999975882

@@ -120,7 +121,10 @@ def allow_request(self, request, view):
if self.key is None:
return True

self.history = self.cache.get(self.key, [])
self.history = self.cache.get(self.key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does there need to be a some sort of check that the returned value from the cache is really a deque instance? If the cache is populated by a previous version of DRF self.history will still be a list instance right?

@@ -136,7 +140,7 @@ def throttle_success(self):
Inserts the current request's timestamp along with the key
into the cache.
"""
self.history.insert(0, self.now)
self.history.appendleft(self.now)
self.cache.set(self.key, self.history, self.duration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cache backends for Django (e.g. Redis) support JSON serialization of cache data (instead of pickle). These might be convert a deque to list (or throw a TypeError on insertion).

@tomchristie
Copy link
Member

I'm pretty sure we've seen a similar pull request, that we merged, and then had to revert.
Even tiny changes like this carry a risk factor and are likely to end up breaking someone's codebase, somewhere.
At this point in the project's lifespan it's not worth the trade-off.

Ah... found it: https://github.com/encode/django-rest-framework/blob/f83620dcc9e87f81ddc846c56f2ad87c2e548f8d/docs/community/release-notes.md#3124

@tomchristie tomchristie closed this Aug 3, 2021
@tomchristie
Copy link
Member

This: #7849

Which we needed to revert: #7870

@tomchristie
Copy link
Member

The most relevant improvement here would be anything that we can do to help guide contributors away from code changing pull requests at this point, unless absolutely neccessary.

@JasperSui
Copy link
Author

I'm pretty sure we've seen a similar pull request, that we merged, and then had to revert.
Even tiny changes like this carry a risk factor and are likely to end up breaking someone's codebase, somewhere.
At this point in the project's lifespan it's not worth the trade-off.

Ah... found it: https://github.com/encode/django-rest-framework/blob/f83620dcc9e87f81ddc846c56f2ad87c2e548f8d/docs/community/release-notes.md#3124

Yes, it seemed that some tiny changes will hurt someone unexpectedly, I totally agree with your point.
So maybe we can implement the deque version to be hurtless to others.

Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants