Skip to content

[ET-VK] Handle scalar tensor and mutable buffer inputs in Vulkan delegate runtime #5930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Oct 7, 2024

Stack from ghstack (oldest at bottom):

Context

  • Handle scalar tensor inputs by adding them to the graph as symbolic ints
  • Add support for symint inputs in the Vulkan delegate
  • Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an input_pos integer scalar tensor as an input. input_pos is used in KV cache updates and determines the sizes of the cache slices.

Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: D63979312

…gate runtime

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Oct 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5930

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit e34f3f5 with merge base aad548c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/periodic module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ labels Oct 7, 2024
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63979312

SS-JIA added a commit that referenced this pull request Oct 7, 2024
…gate runtime

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)

ghstack-source-id: 246578362
Pull Request resolved: #5930
…Vulkan delegate runtime"

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Oct 7, 2024
…gate runtime

Pull Request resolved: #5930

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

Additionally, mutable buffer inputs/outputs, which appear as `TensorRef` to the Vulkan graph, are handled as well by ignoring them when copying outputs. More details in the comments.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)
ghstack-source-id: 246588007
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63979312

…Vulkan delegate runtime"

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63979312

SS-JIA added a commit that referenced this pull request Oct 7, 2024
…gate runtime

Pull Request resolved: #5930

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

Additionally, mutable buffer inputs/outputs, which appear as `TensorRef` to the Vulkan graph, are handled as well by ignoring them when copying outputs. More details in the comments.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.
ghstack-source-id: 246627218

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)
…Vulkan delegate runtime"

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)

[ghstack-poisoned]
SS-JIA added a commit that referenced this pull request Oct 8, 2024
…gate runtime

Pull Request resolved: #5930

## Context

* Handle scalar tensor inputs by adding them to the graph as symbolic ints
* Add support for symint inputs in the Vulkan delegate
* Add type checking for Vulkan delegate inputs and outputs

This is needed for Transformer models, which receive a an `input_pos` integer scalar tensor as an input. `input_pos` is used in KV cache updates and determines the sizes of the cache slices.

Additionally, mutable buffer inputs/outputs, which appear as `TensorRef` to the Vulkan graph, are handled as well by ignoring them when copying outputs. More details in the comments.

### Why are scalar tensors added as symint?

Adding scalar tensors as symints makes more sense than adding them as real tensors, since symints are commonly used to inform tensor shapes. Adding scalar tensors as symints allow them to be easily accessible by the CPU at graph encoding and resizing time, as well as easily accesible by the GPU within compute shaders.
ghstack-source-id: 246752221

Differential Revision: [D63979312](https://our.internmc.facebook.com/intern/diff/D63979312/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63979312

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 62a13c1.

@SS-JIA SS-JIA deleted the gh/SS-JIA/108/head branch January 24, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants