improve the eval perf with kv cache #3732

cccclai · 2024-05-24T17:57:58Z

Summary:
The original implementation was too slow, and because of the frequent travel: cpu->gpu->cpu->gpu-> , it's inefficient. Change it to batch process the sequence so the compute remains in gpu

When evaluate stories model, before the change:

2024-05-23:23:42:25,115 INFO     [evaluator.py:362] Running loglikelihood_rolling requests
100%|██████████| 5/5 [02:37<00:00, 31.50s/it]
wikitext: {'word_perplexity,none': 10589.525426446424, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.111053701258041, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.6114211588515417, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

After the change:

2024-05-23:23:36:50,339 INFO     [evaluator.py:362] Running loglikelihood_rolling requests
100%|██████████| 5/5 [00:03<00:00,  1.55it/s]
wikitext: {'word_perplexity,none': 10589.52618994558, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.111053787314264, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.611421179167659, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Differential Revision: D57764318

pytorch-bot · 2024-05-24T17:58:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3732

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6d92e53 with merge base 1343224 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-05-24T17:58:07Z

This pull request was exported from Phabricator. Differential Revision: D57764318

Summary: The original implementation was too slow, and because of the frequent travel: cpu->gpu->cpu->gpu-> , it's inefficient. Change it to batch process the sequence so the compute remains in gpu When evaluate stories model, before the change: ``` 2024-05-23:23:42:25,115 INFO [evaluator.py:362] Running loglikelihood_rolling requests 100%|██████████| 5/5 [02:37<00:00, 31.50s/it] wikitext: {'word_perplexity,none': 10589.525426446424, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.111053701258041, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.6114211588515417, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} ``` After the change: ``` 2024-05-23:23:36:50,339 INFO [evaluator.py:362] Running loglikelihood_rolling requests 100%|██████████| 5/5 [00:03<00:00, 1.55it/s] wikitext: {'word_perplexity,none': 10589.52618994558, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.111053787314264, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.611421179167659, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} ``` Differential Revision: D57764318

facebook-github-bot · 2024-05-24T17:58:39Z

This pull request was exported from Phabricator. Differential Revision: D57764318

facebook-github-bot · 2024-05-24T19:54:51Z

This pull request has been merged in f42942a.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 24, 2024

facebook-github-bot added the fb-exported label May 24, 2024

cccclai force-pushed the export-D57764318 branch from 69b31c7 to 6d92e53 Compare May 24, 2024 17:58

lucylq approved these changes May 24, 2024

View reviewed changes

facebook-github-bot closed this in f42942a May 24, 2024

facebook-github-bot added the Merged label May 24, 2024

cccclai mentioned this pull request Jun 3, 2024

add 16a4w_hqq quant mode #3752

Closed

helunwencser mentioned this pull request Aug 14, 2024

fix eager_eval with kv cache and improve pybind eval speed #4720

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve the eval perf with kv cache #3732

improve the eval perf with kv cache #3732

Uh oh!

cccclai commented May 24, 2024

Uh oh!

pytorch-bot bot commented May 24, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented May 24, 2024

Uh oh!

facebook-github-bot commented May 24, 2024

Uh oh!

facebook-github-bot commented May 24, 2024

Uh oh!

Uh oh!

improve the eval perf with kv cache #3732

improve the eval perf with kv cache #3732

Uh oh!

Conversation

cccclai commented May 24, 2024

Uh oh!

pytorch-bot bot commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3732

✅ No Failures

Uh oh!

facebook-github-bot commented May 24, 2024

Uh oh!

facebook-github-bot commented May 24, 2024

Uh oh!

facebook-github-bot commented May 24, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented May 24, 2024 •

edited

Loading