-
Notifications
You must be signed in to change notification settings - Fork 608
use --use_sdpa_with_kv_cache for 1B/3B bf16 #5861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format. Differential Revision: [D63871048](https://our.internmc.facebook.com/intern/diff/D63871048/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5861
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit ac207db with merge base 20a157f ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format. Differential Revision: [D63871048](https://our.internmc.facebook.com/intern/diff/D63871048/) ghstack-source-id: 246240510 Pull Request resolved: #5861
This pull request was exported from Phabricator. Differential Revision: D63871048 |
We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format. Differential Revision: [D63871048](https://our.internmc.facebook.com/intern/diff/D63871048/) [ghstack-poisoned]
We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format. Differential Revision: [D63871048](https://our.internmc.facebook.com/intern/diff/D63871048/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D63871048 |
Pull Request resolved: #5861 We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format. Differential Revision: [D63871048](https://our.internmc.facebook.com/intern/diff/D63871048/) ghstack-source-id: 246391007
This pull request has been merged in 2726bdb. |
Stack from ghstack (oldest at bottom):
We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format.
Differential Revision: D63871048