You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
The original implementation was too slow, and because of the frequent travel: cpu->gpu->cpu->gpu-> , it's inefficient. Change it to batch process the sequence so the compute remains in gpu
When evaluate stories model, before the change:
```
2024-05-23:23:42:25,115 INFO [evaluator.py:362] Running loglikelihood_rolling requests
100%|██████████| 5/5 [02:37<00:00, 31.50s/it]
wikitext: {'word_perplexity,none': 10589.525426446424, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.111053701258041, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.6114211588515417, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
After the change:
```
2024-05-23:23:36:50,339 INFO [evaluator.py:362] Running loglikelihood_rolling requests
100%|██████████| 5/5 [00:03<00:00, 1.55it/s]
wikitext: {'word_perplexity,none': 10589.52618994558, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 6.111053787314264, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 2.611421179167659, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
Differential Revision: D57764318
0 commit comments