Skip to content

Replace view_copy with view (1/3) #2461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

metascroy
Copy link
Contributor

Summary:
Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node. This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node). Note that this pass combined with dead-code elimination removes redundant view copies. This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes. A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission. A memory.view node has a special TensorSpec of type _MemoryViewSpec. This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec. Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes. Only static nodes that are memory planned are converted. Not all static nodes are memory planned in ExecuTorch. For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned. Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted. We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination. In the ExecutorchBackendConfig, there is a new option remove_static_view_copy. If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base. Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it. Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected. (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node. This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539. The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination. The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555

Copy link

pytorch-bot bot commented Mar 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2461

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d952b0f with merge base d612c23 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 15, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 15, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 15, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 15, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 17, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 17, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 17, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 17, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 17, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 18, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 20, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 20, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 21, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 21, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 21, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 22, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 31, 2024
Summary:
Pull Request resolved: pytorch#2461

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 31, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Mar 31, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 31, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Mar 31, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Apr 1, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Apr 1, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
metascroy added a commit to metascroy/executorch that referenced this pull request Apr 2, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Apr 2, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54816555

metascroy added a commit to metascroy/executorch that referenced this pull request Apr 2, 2024
Summary:

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

Reviewed By: larryliu0820

Differential Revision: D54816555
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 93fa3d6.

kirklandsign pushed a commit to kirklandsign/executorch that referenced this pull request Apr 4, 2024
Summary:
Pull Request resolved: pytorch#2461

Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib

This stack replaces view_copy nodes with memory.view nodes.

In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node.  This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node).  Note that this pass combined with dead-code elimination removes redundant view copies.  This is because a redundant view copy will have no users have this pass.

In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes.  A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission.  A memory.view node has a special TensorSpec of type _MemoryViewSpec.  This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec.  Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec.

Not all view_copy nodes are converted to memory.view nodes.  Only static nodes that are memory planned are converted.  Not all static nodes are memory planned in ExecuTorch.  For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned.  Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted.  We can expand this list over time.

In the third diff (D54827438), I implement the actual view_copy elimination.  In the ExecutorchBackendConfig, there is a new option remove_static_view_copy.  If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today).

Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass.

The first two steps are the just the first and second diff described above.

In config.to_out_var_pass, the memory.view nodes are skipped.

In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base.  Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it.  Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected.  (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node.  This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.)

Finally, during emission the memory.view is emitted as an evalue.

There are two more diffs on the stack D54866523 and D54866539.  The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination.  The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination.

bypass-github-export-checks

Reviewed By: JacobSzwejbka, larryliu0820, cbilgin

Differential Revision: D54816555

fbshipit-source-id: 11566d62175d604f3ad2898af5f00270ae5847ce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants