[ET-VK] Introduce `virtual_clone` API to support view of view use cases + fix synchronization hazard with view tensors #5753

SS-JIA · 2024-09-30T16:29:58Z

Stack from ghstack (oldest at bottom):

[ET-VK] Generalize softmax for packed dim vs non packed dim #5755
[ET-VK][ez] Add API to read value of SymInt andParamsBuffer #5754
-> [ET-VK] Introduce virtual_clone API to support view of view use cases + fix synchronization hazard with view tensors #5753

Context

This diff fixes some hazards (not necessarily) bugs with view tensors.

`virtual_clone` API

Consider the following sequence of calls which may be common in the view of view use case.

t1 = graph.add_tensor(...);
// t2 will have the same metadata as t1
t2 = graph.add_tensor_view(t1);
// t3 will also have the same metadata as t2 at this point.
t3 = graph.add_tensor_view(t2);

// t2 metadata will be updated correctly.
t2 = add_transpose_view_node(t1, 0, 1, t2);
// Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true.
// As a result, t3 will have incorrect metadata after this node.
t3 = add_transpose_view_node(t2, 1, 2, t3);

To address this, the virtual_clone API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output.

WAW synchronization hazards

vTensorStorage maintains a last_access state which facilitates inserting the correct memory barriers for the underlying vkImage or vkBuffer. However, when we create a tensor view, last_access is not shared between vTensor instances that use the same resource.

As as result, writing into a vTensor will not update the last_access of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard.

This diff fixes this hazard in a bit of a crude way; if the vTensor is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time.

Differential Revision: D63642092

…es + fix synchronization hazard with view tensors ## Context This diff fixes some hazards (not necessarily) bugs with view tensors. ### `virtual_clone` API Consider the following sequence of calls which may be common in the view of view use case. ``` t1 = graph.add_tensor(...); // t2 will have the same metadata as t1 t2 = graph.add_tensor_view(t1); // t3 will also have the same metadata as t2 at this point. t3 = graph.add_tensor_view(t2); // t2 metadata will be updated correctly. t2 = add_transpose_view_node(t1, 0, 1, t2); // Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true. // As a result, t3 will have incorrect metadata after this node. t3 = add_transpose_view_node(t2, 1, 2, t3); ``` To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output. ### WAW synchronization hazards `vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource. As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard. This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time. Differential Revision: [D63642092](https://our.internmc.facebook.com/intern/diff/D63642092/) [ghstack-poisoned]

pytorch-bot · 2024-09-30T16:30:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5753

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 23704a6 with merge base 0d96f75 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-30T16:30:15Z

This pull request was exported from Phabricator. Differential Revision: D63642092

…iew use cases + fix synchronization hazard with view tensors" ## Context This diff fixes some hazards (not necessarily) bugs with view tensors. ### `virtual_clone` API Consider the following sequence of calls which may be common in the view of view use case. ``` t1 = graph.add_tensor(...); // t2 will have the same metadata as t1 t2 = graph.add_tensor_view(t1); // t3 will also have the same metadata as t2 at this point. t3 = graph.add_tensor_view(t2); // t2 metadata will be updated correctly. t2 = add_transpose_view_node(t1, 0, 1, t2); // Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true. // As a result, t3 will have incorrect metadata after this node. t3 = add_transpose_view_node(t2, 1, 2, t3); ``` To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output. ### WAW synchronization hazards `vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource. As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard. This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time. Differential Revision: [D63642092](https://our.internmc.facebook.com/intern/diff/D63642092/) [ghstack-poisoned]

facebook-github-bot · 2024-09-30T17:49:32Z

This pull request was exported from Phabricator. Differential Revision: D63642092

…iew use cases + fix synchronization hazard with view tensors" ## Context This diff fixes some hazards (not necessarily) bugs with view tensors. ### `virtual_clone` API Consider the following sequence of calls which may be common in the view of view use case. ``` t1 = graph.add_tensor(...); // t2 will have the same metadata as t1 t2 = graph.add_tensor_view(t1); // t3 will also have the same metadata as t2 at this point. t3 = graph.add_tensor_view(t2); // t2 metadata will be updated correctly. t2 = add_transpose_view_node(t1, 0, 1, t2); // Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true. // As a result, t3 will have incorrect metadata after this node. t3 = add_transpose_view_node(t2, 1, 2, t3); ``` To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output. ### WAW synchronization hazards `vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource. As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard. This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time. Differential Revision: [D63642092](https://our.internmc.facebook.com/intern/diff/D63642092/) [ghstack-poisoned]

facebook-github-bot · 2024-09-30T20:22:40Z

This pull request was exported from Phabricator. Differential Revision: D63642092

facebook-github-bot · 2024-09-30T21:51:13Z

This pull request has been merged in a5a76f7.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024

facebook-github-bot added the fb-exported label Sep 30, 2024

This was referenced Sep 30, 2024

[ET-VK][ez] Add API to read value of SymInt andParamsBuffer #5754

Closed

[ET-VK] Generalize softmax for packed dim vs non packed dim #5755

Closed

jorgep31415 approved these changes Sep 30, 2024

View reviewed changes

facebook-github-bot closed this in a5a76f7 Sep 30, 2024

facebook-github-bot added the Merged label Sep 30, 2024

SS-JIA deleted the gh/SS-JIA/96/head branch January 24, 2025 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Introduce `virtual_clone` API to support view of view use cases + fix synchronization hazard with view tensors #5753

[ET-VK] Introduce `virtual_clone` API to support view of view use cases + fix synchronization hazard with view tensors #5753

Uh oh!

SS-JIA commented Sep 30, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

Uh oh!

[ET-VK] Introduce virtual_clone API to support view of view use cases + fix synchronization hazard with view tensors #5753

[ET-VK] Introduce virtual_clone API to support view of view use cases + fix synchronization hazard with view tensors #5753

Uh oh!

Conversation

SS-JIA commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

virtual_clone API

WAW synchronization hazards

Uh oh!

pytorch-bot bot commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5753

✅ No Failures

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

Uh oh!

[ET-VK] Introduce `virtual_clone` API to support view of view use cases + fix synchronization hazard with view tensors #5753

[ET-VK] Introduce `virtual_clone` API to support view of view use cases + fix synchronization hazard with view tensors #5753

SS-JIA commented Sep 30, 2024 •

edited

Loading

`virtual_clone` API

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading