-
Notifications
You must be signed in to change notification settings - Fork 607
[Excutorch][Llama] Decouple input sequence length from kv cache context length #7927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Excutorch][Llama] Decouple input sequence length from kv cache context length #7927
Conversation
…xt length Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7927
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1070600 with merge base bdd3d9c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…xt length Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) ghstack-source-id: 262854267 Pull Request resolved: #7927
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) ghstack-source-id: 262945661
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263000137 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263237442 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263342053 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263366616 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) cc mergennachin cccclai helunwencser dvorjackz [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263491763 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) cc mergennachin cccclai helunwencser dvorjackz [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263517976 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) cc mergennachin cccclai helunwencser dvorjackz [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263531315 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) cc mergennachin cccclai helunwencser dvorjackz [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263580354 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
…cache context length" Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) cc mergennachin cccclai helunwencser dvorjackz [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D68448334 |
…xt length Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263653316 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/)
ecb6050
into
gh/kimishpatel/152/base
…xt length (#8047) Pull Request resolved: #7927 Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. ghstack-source-id: 263653316 Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) Co-authored-by: Kimish Patel <[email protected]>
Summary: Previous PR #7927 deecoupled max_seq_length from kv cache. That broke perf ci workflow. Fix that. Test Plan: Trigger it manually and check Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Previous PR #7927 deecoupled max_seq_length from kv cache. That broke perf ci workflow. Fix that. Test Plan: Trigger it manually and check apple perf: https://github.com/pytorch/executorch/actions/runs/13267110949 android perf: https://github.com/pytorch/executorch/actions/runs/13267110908 Reviewers: Subscribers: Tasks: Tags: cc guangy10 huydhn kirklandsign shoumikhin [ghstack-poisoned]
* [Executorch][perf-ci] Fix perf ci Summary: Previous PR #7927 deecoupled max_seq_length from kv cache. That broke perf ci workflow. Fix that. Test Plan: Trigger it manually and check Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned] * Update on "[Executorch][perf-ci] Fix perf ci" Summary: Previous PR #7927 deecoupled max_seq_length from kv cache. That broke perf ci workflow. Fix that. Test Plan: Trigger it manually and check apple perf: https://github.com/pytorch/executorch/actions/runs/13267110949 android perf: https://github.com/pytorch/executorch/actions/runs/13267110908 Reviewers: Subscribers: Tasks: Tags: cc guangy10 huydhn kirklandsign shoumikhin [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
Decouple max sequence length, for shape dynamism in torch.export, from sequence
length used for kv cache sizing.
Differential Revision: D68448334
cc @mergennachin @cccclai @helunwencser @dvorjackz