Skip to content

Dspy fanout cache #8062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 24, 2025
Merged

Dspy fanout cache #8062

merged 15 commits into from
Apr 24, 2025

Conversation

chenmoneygithub
Copy link
Collaborator

@chenmoneygithub chenmoneygithub commented Apr 13, 2025

In order to support high-concurrency workflows, we are using cache.FanoutCache (not supported by litellm cache yet) to replace the diskcache.Cache. The following changes are included in this PR:

  • Unified in-memory cache (LRU cache) and on-disk cache (FanoutCache) in the same class
  • By default not rely on litellm cache, but native DSPy cache. We want to reduce the dependency on litellm for flexibility. Users can still use the litellm cache by calling dspy.configure_cache(enable_litellm_cache=True)
  • The native DSPy cache can export the in-memory cache for future usage. The caveat is it is using pickle.

Performance testing with code below:

import time

import dspy

class CustomModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.cot = dspy.ChainOfThought("question -> answer")

    def forward(self, question: str) -> str:
        return self.cot(question=question)

dspy.configure_cache(enable_memory_cache=False, enable_disk_cache=False, enable_litellm_cache=True)

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini", cache=True), track_usage=True)

cot = CustomModule()

parallelizer = dspy.Parallel(provide_traceback=True)

duplicated_questions = [(cot, {"question": "What is main character of Slam Dunk?"}) for _ in range(100)]

start_time = time.time()
results = parallelizer(duplicated_questions)

end_time = time.time()

print(f"Time taken for dspy parallel: {end_time - start_time} seconds")

Litellm cache:

(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 643.63it/s]
Time taken for dspy parallel: 0.20780491828918457 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 760.89it/s]
Time taken for dspy parallel: 0.2055070400238037 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 775.48it/s]
Time taken for dspy parallel: 0.19990301132202148 seconds

DSPy cache (fanout cache):

(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1645.60it/s]
Time taken for dspy parallel: 0.10368084907531738 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2430.73it/s]
Time taken for dspy parallel: 0.09839200973510742 seconds
(dspy) (base) *[dspy-fanout-cache][~/Documents/mlflow_team/dspy]$ python3 script_tmp/parallel_cache_tmp.py
Processed 100 / 100 examples: 100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1784.57it/s]
Time taken for dspy parallel: 0.10172796249389648 seconds

Fanout cache is stably faster.

@chenmoneygithub chenmoneygithub marked this pull request as draft April 13, 2025 00:10
@chenmoneygithub chenmoneygithub force-pushed the dspy-fanout-cache branch 2 times, most recently from 897924b to 1b000fb Compare April 14, 2025 21:11
@chenmoneygithub chenmoneygithub marked this pull request as ready for review April 14, 2025 21:21
@chenmoneygithub chenmoneygithub requested a review from okhat April 14, 2025 21:21
@chenmoneygithub chenmoneygithub changed the title [WIP] Dspy fanout cache Dspy fanout cache Apr 14, 2025
@chenmoneygithub chenmoneygithub force-pushed the dspy-fanout-cache branch 2 times, most recently from ddc714e to bcfedb3 Compare April 15, 2025 20:53
@okhat okhat self-assigned this Apr 16, 2025
@okhat okhat merged commit ff46299 into stanfordnlp:main Apr 24, 2025
4 checks passed
@chenmoneygithub chenmoneygithub deleted the dspy-fanout-cache branch April 25, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants