[ET-VK] Integrate axis mapping into staging <-> image transfer shaders #5093

SS-JIA · 2024-09-04T22:04:11Z

Stack from ghstack (oldest at bottom):

-> [ET-VK] Integrate axis mapping into staging <-> image transfer shaders #5093

Context

Building on the previous diff, this diff integrates axis mapping into staging <-> image transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping.

The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff.

Differential Revision: D62210117

## Context Add a simple test to track the sizes of various important objects in the Vulkan compute graph API over time. The test uses some loose thresholds to alert when an object has grown unexpectedly large. Differential Revision: [D62144400](https://our.internmc.facebook.com/intern/diff/D62144400/) [ghstack-poisoned]

## Context Introduce the `SymInt` class which allows representation of symbolic integers in a Vulkan graph. Please see the comments documentation of the `SymInt` class for more details regarding why the `Int` type is not sufficient for symbolic integers. Differential Revision: [D62144399](https://our.internmc.facebook.com/intern/diff/D62144399/) [ghstack-poisoned]

## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

…cle temporary tensor memory" ## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

…nsor memory" ## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

…cle temporary tensor memory" ## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

…nsor memory" ## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

…cle temporary tensor memory" ## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

…nsor memory" ## Context Normally, tensor memory is planned during the export stage; tensors that do not overlap in lifetimes may share a memory allocation. However, memory planning requires knowledge of the lifetime of the tensors. However, some complex operators may not be able to perform all the necessary computations in one shader, or the implementation of the operator may require that some temporary tensors be created during the execution of the op. Since these temporary tensors are not visible to the memory planning algorithm, they will not be memory planned. This diff introduces the `TmpTensorVRef` object which facilitates memory sharing between temporary tensors. The design principle is that the lifetime of temporary tensors is restricted to the execution of the op within which they are created; thus, that knowledge can be used to implement memory planning. Please see the comments documentation of `TmpTensorVRef` for more details. Differential Revision: [D62144398](https://our.internmc.facebook.com/intern/diff/D62144398/) [ghstack-poisoned]

## Context Currently, in shaders we have to declare the binding slot that layout bindings will bind to explicitly, i.e. ``` ${layout_declare_tensor(0, "w", "t_out", DTYPE, STORAGE)} ${layout_declare_buffer(1, "r", "nchw_in", DTYPE)} ${layout_declare_ubo(2, "ivec4", "sizes")} ``` However, this can get a little tedious when making many layout declarations. This diff improves the situation by adding the `B` variable which will automatically increment the binding slot whenever a layout binding is declared. Now we can write ``` ${layout_declare_tensor(B, "w", "t_out", DTYPE, STORAGE)} ${layout_declare_buffer(B, "r", "nchw_in", DTYPE)} ${layout_declare_ubo(B, "ivec4", "sizes")} ``` I may make a follow up diff to change all layout declarations to use `B` across all shaders in the codebase later on. Differential Revision: [D62210119](https://our.internmc.facebook.com/intern/diff/D62210119/) [ghstack-poisoned]

…tensors ## Context This diff introduces the `axis_mapping` field for `vTensors`, which can be used to implement no-copy permutes. The idea behind the axis mapping is that it is somewhat analogous to dim order for texture backed tensors. The axis mapping is normalized to 4 dimensions, similar to padded sizes. The first 3 elements indicates which of the (X,Y,Z) image texture axes the width, height, and channels dim of the tensor maps to. The final element indicates the WHCN index of the tensor dimension along which batches will be concatenated. The benefit of introducing axis mapping is twofold: 1. Permutes can be performed without any data copying by re-using a texture but updating the axis mapping. 2. Allows the memory layout of texture backed tensors to be more flexible, and optimize for performance or memory footprint by using unconventional axis mappings. Regarding the second point, we have found that adding length to a texture's Z axis is more costly than adding length to the texture's X or Y axes. Similarly, we have found that reading along the Z axis yeilds slightly lower throughput than reading along the X or Y axes. By introducing axis mapping, we can map the largest dimension to a texture's X axis instead of mapping it to the most intuitive texture axis (i.e. channels to Z axis). This can save a lot of texture memory and potentially improve compute shader latency as well. However, the pre-requisite of using texture mapping heavily is that the overhead introduced in calculating tensor indices and texture positions does not significantly increase compute shader latency. The impact of this will be investigated and shown in the following diffs. Note that this diff only introduces the `axis_mapping` field; Differential Revision: [D62210118](https://our.internmc.facebook.com/intern/diff/D62210118/) [ghstack-poisoned]

## Context Building on the previous diff, this diff integrates axis mapping into staging <-> buffer transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

pytorch-bot · 2024-09-04T22:04:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5093

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit a2ae8dd with merge base 9739609 ():

NEW FAILURE - The following job has failed:

pull / unittest-arm (buck2) / linux-job (gh)
RuntimeError: Command docker exec -t c00e039055b177751c2e040a47d50c5db89591559dfd9dd50767ff2cb6c602b4 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

## Context Building on the previous diff, this diff integrates axis mapping into staging <-> buffer transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) ghstack-source-id: 241066644 Pull Request resolved: #5093

facebook-github-bot · 2024-09-04T22:04:43Z

This pull request was exported from Phabricator. Differential Revision: D62210117

…g <-> buffer transfer shaders" ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> buffer transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

…nsfer shaders" ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> buffer transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

facebook-github-bot · 2024-09-05T21:32:19Z

This pull request was exported from Phabricator. Differential Revision: D62210117

Pull Request resolved: #5093 ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> image transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) ghstack-source-id: 241249802

…g <-> buffer transfer shaders" ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> buffer transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

…nsfer shaders" ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> buffer transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

facebook-github-bot · 2024-09-06T00:05:01Z

This pull request was exported from Phabricator. Differential Revision: D62210117

Pull Request resolved: #5093 ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> image transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. ghstack-source-id: 241282078 Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/)

…g <-> image transfer shaders" ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> image transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

…sfer shaders" ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> image transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/) [ghstack-poisoned]

Pull Request resolved: #5093 ## Context Building on the previous diff, this diff integrates axis mapping into staging <-> image transfer shaders. Alternative versions of indexing utility functions are introduced to account for axis mapping. The impact of shader latency of using axis mapping on transfer shaders is examined in the next diff. ghstack-source-id: 241354024 Differential Revision: [D62210117](https://our.internmc.facebook.com/intern/diff/D62210117/)

facebook-github-bot · 2024-09-06T14:55:36Z

This pull request was exported from Phabricator. Differential Revision: D62210117

SS-JIA added 12 commits September 3, 2024 13:11

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 4, 2024

This was referenced Sep 4, 2024

[ET-VK] Add TmpTensorVRef struct to recycle temporary tensor memory #5041

Merged

[ET-VK][BE][ez] Enable automatic layout slot index incrementing #5091

Merged

[ET-VK] Introduce axis mapping for no-copy permute of texture-backed tensors #5092

Merged

facebook-github-bot added the fb-exported label Sep 4, 2024

SS-JIA changed the base branch from gh/SS-JIA/70/base to gh/SS-JIA/69/head September 4, 2024 22:05

jorgep31415 approved these changes Sep 5, 2024

View reviewed changes

SS-JIA added 2 commits September 5, 2024 14:32

SS-JIA added 2 commits September 5, 2024 17:04

SS-JIA changed the title ~~[ET-VK] Integrate axis mapping into staging <-> buffer transfer shaders~~ [ET-VK] Integrate axis mapping into staging <-> image transfer shaders Sep 6, 2024

Base automatically changed from gh/SS-JIA/69/head to main September 6, 2024 03:28

SS-JIA added 2 commits September 6, 2024 07:55

facebook-github-bot merged commit 41ec7fa into main Sep 6, 2024
36 of 38 checks passed

facebook-github-bot deleted the gh/SS-JIA/70/head branch September 6, 2024 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Integrate axis mapping into staging <-> image transfer shaders #5093

[ET-VK] Integrate axis mapping into staging <-> image transfer shaders #5093

Uh oh!

SS-JIA commented Sep 4, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 4, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 4, 2024

Uh oh!

facebook-github-bot commented Sep 5, 2024

Uh oh!

facebook-github-bot commented Sep 6, 2024

Uh oh!

facebook-github-bot commented Sep 6, 2024

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Integrate axis mapping into staging <-> image transfer shaders #5093

[ET-VK] Integrate axis mapping into staging <-> image transfer shaders #5093

Uh oh!

Conversation

SS-JIA commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Uh oh!

pytorch-bot bot commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5093

❌ 1 New Failure

Uh oh!

facebook-github-bot commented Sep 4, 2024

Uh oh!

facebook-github-bot commented Sep 5, 2024

Uh oh!

facebook-github-bot commented Sep 6, 2024

Uh oh!

facebook-github-bot commented Sep 6, 2024

Uh oh!

Uh oh!

Uh oh!

SS-JIA commented Sep 4, 2024 •

edited

Loading

pytorch-bot bot commented Sep 4, 2024 •

edited

Loading