-
Notifications
You must be signed in to change notification settings - Fork 608
[ET-VK] Handle scalar tensor and mutable buffer inputs in Vulkan delegate runtime #5930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…gate runtime ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5930
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit e34f3f5 with merge base aad548c ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D63979312 |
…gate runtime ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/) ghstack-source-id: 246578362 Pull Request resolved: #5930
…Vulkan delegate runtime" ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/) [ghstack-poisoned]
…gate runtime Pull Request resolved: #5930 ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. Additionally, mutable buffer inputs/outputs, which appear as `TensorRef` to the Vulkan graph, are handled as well by ignoring them when copying outputs. More details in the comments. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/) ghstack-source-id: 246588007
This pull request was exported from Phabricator. Differential Revision: D63979312 |
…Vulkan delegate runtime" ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D63979312 |
…gate runtime Pull Request resolved: #5930 ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. Additionally, mutable buffer inputs/outputs, which appear as `TensorRef` to the Vulkan graph, are handled as well by ignoring them when copying outputs. More details in the comments. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. ghstack-source-id: 246627218 Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)
…Vulkan delegate runtime" ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/) [ghstack-poisoned]
…gate runtime Pull Request resolved: #5930 ## Context * Handle scalar tensor inputs by adding them to the graph as symbolic ints * Add support for symint inputs in the Vulkan delegate * Add type checking for Vulkan delegate inputs and outputs This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices. Additionally, mutable buffer inputs/outputs, which appear as `TensorRef` to the Vulkan graph, are handled as well by ignoring them when copying outputs. More details in the comments. ### Why are scalar tensors added as symint? Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders. ghstack-source-id: 246752221 Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)
This pull request was exported from Phabricator. Differential Revision: D63979312 |
This pull request has been merged in 62a13c1. |
Stack from ghstack (oldest at bottom):
higher_order_auto_functionalized
#5884Context
This is needed for Transformer models, which receive a an
input_pos
integer scalar tensor as an input.input_pos
is used in KV cache updates and determines the sizes of the cache slices.Why are scalar tensors added as symint?
Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.
Differential Revision: D63979312