Skip to content

[aiohttp] - use lcg as random generator #9942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Reskov
Copy link
Contributor

@Reskov Reskov commented Jun 10, 2025

We currently make a large number of randint calls, particularly within the update method — for example, each call to update may trigger random * 2 * number of queries.

The existing Python implementation uses the Mersenne Twister, which is a high-quality RNG with 53-bit precision and a very long period (2**19937−1). However, for our simplified use case, a classic LCG (Linear Congruential Generator) is more than sufficient and significantly faster, especially given the number of calls we make.

Benchmark code

from random import sample, randint, randrange, choices
import timeit

# Define number of queries
num_queries = 20
# Benchmark the two approaches
def method1():
    return sample(range(1, 10001), num_queries)

def method2():
    return random_unique_ids(num_queries)

from app.random_utils import random_unique_ids as random_unique_ids_cython
def method3():
    return random_unique_ids_cython(num_queries)

# Run benchmark
iterations = 100000
time1 = timeit.timeit(method1, number=iterations)
time2 = timeit.timeit(method2, number=iterations)
time3 = timeit.timeit(method3, number=iterations)

print(f"sample: {time1:.6f} seconds")
print(f"lcg pure python: {time2:.6f} seconds")
print(f"lcg cython: {time3:.6f} seconds")
print(f"Difference: {abs(time1-time3):.6f} seconds")
print(f"lcg cython is faster by {(max(time1,time3)/min(time1,time3)-1)*100:.2f}%")

Results

sample: 0.749489 seconds
lcg pure python: 0.379528 seconds
lcg cython: 0.039777 seconds
Difference: 0.709712 seconds
lcg cython is faster by 1784.24%

Updates

Before

 Queries: 10 for update
 wrk -H 'Host: tfb-server' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 30 -c 32 --timeout 8 -t 6 "http://tfb-server:8080/updates/10"
---------------------------------------------------------
Running 30s test @ http://tfb-server:8080/updates/10
  6 threads and 32 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.63ms    1.88ms  76.45ms   96.61%
    Req/Sec     3.41k   844.18     9.66k    72.68%
  Latency Distribution
     50%    1.40ms
     75%    1.90ms
     90%    2.48ms
     99%    6.93ms
  611203 requests in 30.10s, 270.92MB read
Requests/sec:  20305.71
Transfer/sec:      9.00MB

After

Running 30s test @ http://tfb-server:8080/updates/10
  6 threads and 32 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.38ms  848.87us  31.48ms   84.04%
    Req/Sec     3.74k   664.81    11.00k    69.42%
  Latency Distribution
     50%    1.17ms
     75%    1.78ms
     90%    2.33ms
     99%    3.77ms
  669708 requests in 30.10s, 296.85MB read
Requests/sec:  22249.46
Transfer/sec:      9.86MB
STARTTIME 1749525574

@Reskov Reskov marked this pull request as draft June 10, 2025 03:44
@Reskov
Copy link
Contributor Author

Reskov commented Jun 10, 2025

@Dreamsorcerer please take a look 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of feels like cheating, this seems beyond benchmarking a realistic framework.

This feels more like a potential contribution to cpython to provide faster built-in random functions, thus benefiting everyone.

Copy link
Contributor Author

@Reskov Reskov Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of feels like cheating, this seems beyond benchmarking a realistic framework.

Agreed, but like many other things, module-level caching, execute queries without read actual results.

This looks more like a potential contribution to cpython to provide faster built-in random functions, which would benefit everyone.

This is certainly not intended as a contribution to Python itself, since LCGs are inferior to existing random number generators. They have significantly shorter periods, and their speed advantage comes only from the small space size and the simplicity of the algorithm. In our case, the short period and predictability of the sequence don’t matter—we’re not aiming for cryptographic or truly random values—but overall, LCGs are still a poor choice.

We are going back to the roots here 😀 Even Python 1.0 had improved version of LCG
https://github.com/nagayev/old-python/blob/a93727bb9eb40818ecaafb50e1e942ab75b3b6d3/Lib/whrandom.py#L73

@@ -4,7 +4,9 @@ ADD ./ /aiohttp

WORKDIR /aiohttp

RUN pip3 install -r /aiohttp/requirements-cpython.txt
RUN pip3 install -r /aiohttp/requirements-cpython.txt && \
pip3 install cython==3.1.2 setuptools==80.9.0 && \
Copy link
Contributor

@Dreamsorcerer Dreamsorcerer Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do use this, we atleast want dependencies in the requirements files, so they can be updated with Dependabot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants