Skip to content

[ET-VK] Introduce virtual_clone API to support view of view use cases + fix synchronization hazard with view tensors #5753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Sep 30, 2024

Stack from ghstack (oldest at bottom):

Context

This diff fixes some hazards (not necessarily) bugs with view tensors.

virtual_clone API

Consider the following sequence of calls which may be common in the view of view use case.

t1 = graph.add_tensor(...);
// t2 will have the same metadata as t1
t2 = graph.add_tensor_view(t1);
// t3 will also have the same metadata as t2 at this point.
t3 = graph.add_tensor_view(t2);

// t2 metadata will be updated correctly.
t2 = add_transpose_view_node(t1, 0, 1, t2);
// Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true.
// As a result, t3 will have incorrect metadata after this node.
t3 = add_transpose_view_node(t2, 1, 2, t3);

To address this, the virtual_clone API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output.

WAW synchronization hazards

vTensorStorage maintains a last_access state which facilitates inserting the correct memory barriers for the underlying vkImage or vkBuffer. However, when we create a tensor view, last_access is not shared between vTensor instances that use the same resource.

As as result, writing into a vTensor will not update the last_access of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard.

This diff fixes this hazard in a bit of a crude way; if the vTensor is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time.

Differential Revision: D63642092

…es + fix synchronization hazard with view tensors

## Context

This diff fixes some hazards (not necessarily) bugs with view tensors.

### `virtual_clone` API

Consider the following sequence of calls which may be common in the view of view use case.

```
t1 = graph.add_tensor(...);
// t2 will have the same metadata as t1
t2 = graph.add_tensor_view(t1);
// t3 will also have the same metadata as t2 at this point.
t3 = graph.add_tensor_view(t2);

// t2 metadata will be updated correctly.
t2 = add_transpose_view_node(t1, 0, 1, t2);
// Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true.
// As a result, t3 will have incorrect metadata after this node.
t3 = add_transpose_view_node(t2, 1, 2, t3);
```

To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output.

### WAW synchronization hazards

`vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource.

As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard.

This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time.

Differential Revision: [D63642092](https://our.internmc.facebook.com/intern/diff/D63642092/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Sep 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5753

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 23704a6 with merge base 0d96f75 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63642092

…iew use cases + fix synchronization hazard with view tensors"

## Context

This diff fixes some hazards (not necessarily) bugs with view tensors.

### `virtual_clone` API

Consider the following sequence of calls which may be common in the view of view use case.

```
t1 = graph.add_tensor(...);
// t2 will have the same metadata as t1
t2 = graph.add_tensor_view(t1);
// t3 will also have the same metadata as t2 at this point.
t3 = graph.add_tensor_view(t2);

// t2 metadata will be updated correctly.
t2 = add_transpose_view_node(t1, 0, 1, t2);
// Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true.
// As a result, t3 will have incorrect metadata after this node.
t3 = add_transpose_view_node(t2, 1, 2, t3);
```

To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output.

### WAW synchronization hazards

`vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource.

As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard.

This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time.

Differential Revision: [D63642092](https://our.internmc.facebook.com/intern/diff/D63642092/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63642092

…iew use cases + fix synchronization hazard with view tensors"

## Context

This diff fixes some hazards (not necessarily) bugs with view tensors.

### `virtual_clone` API

Consider the following sequence of calls which may be common in the view of view use case.

```
t1 = graph.add_tensor(...);
// t2 will have the same metadata as t1
t2 = graph.add_tensor_view(t1);
// t3 will also have the same metadata as t2 at this point.
t3 = graph.add_tensor_view(t2);

// t2 metadata will be updated correctly.
t2 = add_transpose_view_node(t1, 0, 1, t2);
// Unfortunately, this node will have an assumption that t3 has the same metadata as t2 to start. However, this is not true.
// As a result, t3 will have incorrect metadata after this node.
t3 = add_transpose_view_node(t2, 1, 2, t3);
```

To address this, the `virtual_clone` API is introduced which will allow view nodes to set the metadata of the output equal to the input before modifying the output.

### WAW synchronization hazards

`vTensorStorage` maintains a `last_access` state which facilitates inserting the correct memory barriers for the underlying `vkImage` or `vkBuffer`. However, when we create a tensor view, `last_access` is not shared between `vTensor` instances that use the same resource.

As as result, writing into a `vTensor` will not update the `last_access` of its views, and vice versa. Therefore, sebsequent accesses of the other tensor that references the same resource will result in a synchronization hazard.

This diff fixes this hazard in a bit of a crude way; if the `vTensor` is a copy, or has copies, then cowardly assume that it has been written to before the current access so that appropriate memory barriers are inserted. This was the selected solution because I thought that adding a map to track last access of tensors that share resources is a bit overkill when the assumption that the underlying resource has been written to before the current access should hold most of the time.

Differential Revision: [D63642092](https://our.internmc.facebook.com/intern/diff/D63642092/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63642092

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a5a76f7.

@SS-JIA SS-JIA deleted the gh/SS-JIA/96/head branch January 24, 2025 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants