Dspy fanout cache #8062

chenmoneygithub · 2025-04-13T00:10:44Z

In order to support high-concurrency workflows, we are using cache.FanoutCache (not supported by litellm cache yet) to replace the diskcache.Cache. The following changes are included in this PR:

Unified in-memory cache (LRU cache) and on-disk cache (FanoutCache) in the same class
By default not rely on litellm cache, but native DSPy cache. We want to reduce the dependency on litellm for flexibility. Users can still use the litellm cache by calling dspy.configure_cache(enable_litellm_cache=True)
The native DSPy cache can export the in-memory cache for future usage. The caveat is it is using pickle.

Performance testing with code below:

import time

import dspy

class CustomModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.cot = dspy.ChainOfThought("question -> answer")

    def forward(self, question: str) -> str:
        return self.cot(question=question)

dspy.configure_cache(enable_memory_cache=False, enable_disk_cache=False, enable_litellm_cache=True)

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini", cache=True), track_usage=True)

cot = CustomModule()

parallelizer = dspy.Parallel(provide_traceback=True)

duplicated_questions = [(cot, {"question": "What is main character of Slam Dunk?"}) for _ in range(100)]

start_time = time.time()
results = parallelizer(duplicated_questions)

end_time = time.time()

print(f"Time taken for dspy parallel: {end_time - start_time} seconds")

Litellm cache:

(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 643.63it/s]
Time taken for dspy parallel: 0.20780491828918457 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 760.89it/s]
Time taken for dspy parallel: 0.2055070400238037 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 775.48it/s]
Time taken for dspy parallel: 0.19990301132202148 seconds

DSPy cache (fanout cache):

(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1645.60it/s]
Time taken for dspy parallel: 0.10368084907531738 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2430.73it/s]
Time taken for dspy parallel: 0.09839200973510742 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1784.57it/s]
Time taken for dspy parallel: 0.10172796249389648 seconds

Fanout cache is stably faster.

chenmoneygithub marked this pull request as draft April 13, 2025 00:10

chenmoneygithub force-pushed the dspy-fanout-cache branch 2 times, most recently from 897924b to 1b000fb Compare April 14, 2025 21:11

chenmoneygithub marked this pull request as ready for review April 14, 2025 21:21

chenmoneygithub requested a review from okhat April 14, 2025 21:21

chenmoneygithub changed the title ~~[WIP] Dspy fanout cache~~ Dspy fanout cache Apr 14, 2025

chenmoneygithub force-pushed the dspy-fanout-cache branch 2 times, most recently from ddc714e to bcfedb3 Compare April 15, 2025 20:53

okhat self-assigned this Apr 16, 2025

chenmoneygithub added 13 commits April 15, 2025 19:29

init

2a32241

increment

ad9bb06

add test

f3df374

remove old cache code

3624180

rename fanout_cache to disk_cache

0567836

add testing

93eec7d

fix tests

97603af

add cache for colbert

327756a

fix tests

35a2bd3

increase timeout to 10s

e24b7c0

change default value

41bce51

cache embedding

b15cc6c

fallback to litellm cache for embedding call

fc91899

chenmoneygithub force-pushed the dspy-fanout-cache branch from 5320374 to fc91899 Compare April 16, 2025 02:29

okhat mentioned this pull request Apr 21, 2025

[Feature] Parameterize In-memory cache #8091

Closed

2 tasks

chenmoneygithub added 2 commits April 22, 2025 14:54

merge main

87356d6

fix test

2aea36a

okhat merged commit ff46299 into stanfordnlp:main Apr 24, 2025
4 checks passed

chenmoneygithub deleted the dspy-fanout-cache branch April 25, 2025 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dspy fanout cache #8062

Dspy fanout cache #8062

Uh oh!

chenmoneygithub commented Apr 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Dspy fanout cache #8062

Dspy fanout cache #8062

Uh oh!

Conversation

chenmoneygithub commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenmoneygithub commented Apr 13, 2025 •

edited

Loading