Replace view copy with view (3/3)

metascroy · facebook-github-bot · commit 8dcea231f49b · 2024-03-15T12:50:50.000-07:00
Summary: Design: https://docs.google.com/document/d/1l9x925EOrE8mHFJdRCC59nBJXyqBdnoeK-EgNQScXD0/edit#heading=h.kocb2mvchnib This stack replaces view_copy nodes with memory.view nodes. In the first diff (D54816555), I write a pass to normalize view_copy nodes by making their base point to the upstream non-view node. This means if we have something like op -> view_copy1 -> view_copy2, then after normalization, both view copies will point to op in their base (assuming op is not a view node). Note that this pass combined with dead-code elimination removes redundant view copies. This is because a redundant view copy will have no users have this pass. In the second diff (D54827305), I write a pass to convert view_copy nodes to memory.view nodes. A memory.view is similar to torch.ops.aten.view.default, but it is its own function so that we can handle it specially during memory planning and emission. A memory.view node has a special TensorSpec of type _MemoryViewSpec. This spec is immutable and dynamically looks up non-size related fields from its base's TensorSpec. Because it is immutable, fields on a _MemoryViewSpec cannot be set, but if a field is updated on the base spec, this update is reflected in the memory.view node's _MemoryViewSpec. Not all view_copy nodes are converted to memory.view nodes. Only static nodes that are memory planned are converted. Not all static nodes are memory planned in ExecuTorch. For example, there is an option to turn off memory planning for input nodes, and outputs from some higher order ops like cond are not memory planned. Which nodes are memory planned is not easily available, and I did not try to cover all cases of nodes that can be converted. We can expand this list over time. In the third diff (D54827438), I implement the actual view_copy elimination. In the ExecutorchBackendConfig, there is a new option remove_static_view_copy. If remove_static_view_copy = True, the memory planning passes are [NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass]; if remove_static_view_copy = False, the memory planning passes are [config.to_out_var_pass, config.memory_planning_pass] (state today). Let's look at the flow when remove_static_view_copy = True: NormalizeViewCopyBasePass(), ReplaceViewCopyWithMemoryViewPass(), config.to_out_var_pass, config.memory_planning_pass. The first two steps are the just the first and second diff described above. In config.to_out_var_pass, the memory.view nodes are skipped. In config.memory_planning_pass, when a spec is requested for a memory.view node (e.g., to update the lifetime), we return the spec of its base. Returning the spec for the base means that whenever we see a memory.view node, we actually update the lifetime of the base to cover it. Moreover, the memory.view node's special _MemoryViewSpec sees this update reflected. (Note that an exception would be thrown if we kept the usual flow and returned the spec for the memory.view node. This is because the special _MemoryViewSpec is immutable and would not allow the memory_planning_pass to update its lifetime.) Finally, during emission the memory.view is emitted as an evalue. There are two more diffs on the stack D54866523 and D54866539. The first of these replaces the old RemoveRedundantViewCopy pass with a NormalizeViewCopyBasePass + dead code elimination. The second converts view-like ops (squeeze, unsqueeze, slice) to view ops when safe to do so to take advantage of the view_copy elimination. Differential Revision: D54827438
diff --git a/exir/capture/_config.py b/exir/capture/_config.py
@@ -75,3 +75,8 @@ class ExecutorchBackendConfig:
     # be a power of 2. If not provided, uses the value in the schema file.
     delegate_alignment: Optional[int] = None
     sym_shape_eval_pass: PassType = HintBasedSymShapeEvalPass()
+
+    # If set to true, view_copy operations will be removed from the graph when safe
+    # Rather than be emitted as operators, they will be emitted as evalues that share
+    # the same underlying storage as their base
+    try_remove_view_copy: bool = True
diff --git a/exir/emit/_emitter.py b/exir/emit/_emitter.py
@@ -1198,6 +1198,14 @@ def call_function(
             assert len(args) == 1
             return self._emit_spec(self.node.meta["spec"])
 
+        elif target == memory.view:
+            assert len(args) == 2
+
+            # A memory.view's base should already be emitted, so the
+            # memory.view's spec should dynamically reference its base's
+            # final state
+            return self._emit_spec(self.node.meta["spec"])
+
         elif target == memory.free:
             assert len(args) == 1
             # pyre-ignore
diff --git a/exir/memory_planning.py b/exir/memory_planning.py
@@ -261,6 +261,19 @@ def verify_graph_input_output(self) -> None:
                 graph_output_allocated == self.alloc_graph_output
             ), f"Misallocate graph output {graph_output_allocated} v.s. {self.alloc_graph_output}"
 
+    def verify_memory_view_are_memory_planned(self) -> None:
+        """
+        memory.view nodes should only exist if their base is memory planned.
+        """
+        for node in self.graph_module.graph.nodes:
+            if node.op == "call_function" and node.target == memory.view:
+                assert (
+                    node.meta["spec"].const or node.meta["spec"].mem_id is not None
+                ), "memory.view node is not const and has no mem_id."
+                assert (
+                    node.meta["spec"].const or node.meta["spec"].mem_offset is not None
+                ), "memory.view node is not const has no mem_offset."
+
 
 def register_algo(fn: Callable[..., List[int]]) -> Callable[..., List[int]]:
     algo_name = fn.__name__
@@ -535,7 +548,13 @@ def get_node_tensor_specs(
     has no tensor specs.
     """
     # get tensor specs
-    specs = node.meta.get("spec")
+    if node.target == memory.view:
+        base = node.args[0]
+        assert isinstance(base, torch.fx.Node)
+        specs = base.meta.get("spec")
+    else:
+        specs = node.meta.get("spec")
+
     if isinstance(specs, TensorSpec):
         specs = [specs]
     if not isinstance(specs, (list, tuple)):
diff --git a/exir/passes/__init__.py b/exir/passes/__init__.py
@@ -251,6 +251,7 @@ def callWithLoggerEnabled(self, graph_module: torch.fx.GraphModule) -> None:
     # we won't see it in the input graph to the to_out_variant pass, unless
     # it's retraced after running to_out_variant with the first trace.
     memory.alloc,
+    memory.view,
     executorch_call_delegate,
     torch.ops.aten.copy_.default,
 }
diff --git a/exir/passes/memory_planning_pass.py b/exir/passes/memory_planning_pass.py
@@ -128,4 +128,5 @@ def run(
                 f"The {self.memory_planning_algo} algorithm reuses storage for {num_reuse_pairs} pair of tensors"
             )
         verifier.verify_graph_input_output()
+        verifier.verify_memory_view_are_memory_planned()
         return PassResult(graph_module, True)
diff --git a/exir/program/TARGETS b/exir/program/TARGETS
@@ -32,8 +32,10 @@ python_library(
         "//executorch/exir/emit:lib",
         "//executorch/exir/passes:insert_write_back_for_buffers_pass",
         "//executorch/exir/passes:lib",
+        "//executorch/exir/passes:normalize_view_copy_base_pass",
         "//executorch/exir/passes:remove_graph_asserts_pass",
         "//executorch/exir/passes:remove_mixed_type_operators",
+        "//executorch/exir/passes:replace_view_copy_with_memory_view_pass",
         "//executorch/exir/passes:spec_prop_pass",
         "//executorch/exir/verification:verifier",
     ],
diff --git a/exir/program/_program.py b/exir/program/_program.py
@@ -31,8 +31,14 @@
 from executorch.exir.passes.insert_write_back_for_buffers_pass import (
     insert_write_back_for_buffers_pass,
 )
+from executorch.exir.passes.normalize_view_copy_base_pass import (
+    NormalizeViewCopyBasePass,
+)
 from executorch.exir.passes.remove_graph_asserts_pass import RemoveGraphAssertsPass
 from executorch.exir.passes.remove_mixed_type_operators import RemoveMixedTypeOperators
+from executorch.exir.passes.replace_view_copy_with_memory_view_pass import (
+    ReplaceViewCopyWithMemoryViewPass,
+)
 from executorch.exir.passes.spec_prop_pass import SpecPropPass
 from executorch.exir.print_program import pretty_print, print_program
 from executorch.exir.schema import Program
@@ -580,6 +586,23 @@ def _to_edge(ep, config: EdgeCompileConfig) -> "ExirExportedProgram":
     return new_ep
 
 
+def memory_planning_passes(config: ExecutorchBackendConfig) -> List[PassType]:
+    if config.try_remove_view_copy:
+        # pyre-ignore
+        return [
+            NormalizeViewCopyBasePass(),
+            ReplaceViewCopyWithMemoryViewPass(),
+            config.to_out_var_pass,
+            config.memory_planning_pass,
+        ]
+    else:
+        # pyre-ignore
+        return [
+            config.to_out_var_pass,
+            config.memory_planning_pass,
+        ]
+
+
 def edge_to_executorch_passes(config: ExecutorchBackendConfig) -> List[PassType]:
     # pyre-ignore
     passes: List[PassType] = [
@@ -591,8 +614,8 @@ def edge_to_executorch_passes(config: ExecutorchBackendConfig) -> List[PassType]
         EdgeToBackendOpsPass(),
         RemoveGraphAssertsPass(),
         config.sym_shape_eval_pass,
-        config.to_out_var_pass,
-    ]
+    ] + memory_planning_passes(config)
+
     return passes
 
 
@@ -835,6 +858,12 @@ def to_executorch(
             gm, new_signature = insert_write_back_for_buffers_pass(program)
             new_gm = program.graph_module
             for p in edge_to_executorch_passes(config):
+                if isinstance(p, ReplaceViewCopyWithMemoryViewPass):
+                    # This is similar to the hack in SpecPropPass
+                    # Ideally passes would work on ExportedPrograms, but today
+                    # they work on GraphModules
+                    p.set_program(program)
+
                 new_gm_res = p(new_gm)
                 assert new_gm_res is not None
                 new_gm = new_gm_res.graph_module
diff --git a/exir/tests/TARGETS b/exir/tests/TARGETS
@@ -447,3 +447,17 @@ python_unittest(
         "//executorch/exir:print_program",
     ],
 )
+
+python_unittest(
+    name = "test_try_remove_view_copy",
+    srcs = [
+        "test_try_remove_view_copy.py",
+    ],
+    deps = [
+        "//caffe2:torch",
+        "//executorch/exir:lib",
+        "//executorch/exir:memory",
+        "//executorch/exir/capture:config",
+        "//executorch/exir/passes:lib",
+    ],
+)
diff --git a/exir/tests/test_try_remove_view_copy.py b/exir/tests/test_try_remove_view_copy.py
@@ -0,0 +1,210 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+import copy
+import unittest
+
+import torch
+import torch.nn as nn
+from executorch.exir import memory, to_edge
+from executorch.exir.capture._config import ExecutorchBackendConfig
+from executorch.exir.passes import MemoryPlanningPass
+
+
+class TestModel1(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.parameter = nn.Parameter(torch.rand(5, 6))
+        self.parameter.requires_grad = False
+
+    def forward(self, x):
+        v1 = self.parameter.view(
+            6, 5
+        )  # removed, lifetime of parameter will be extended
+        v2 = x.view(6, 5)  # not removed
+        v3 = torch.ops.aten.mul.Tensor(v1, v2).view(
+            30
+        )  # removed, lifetime of mul.Tensor will be extended
+        return v3
+
+    def get_example_inputs(self):
+        return (torch.rand(5, 6),)
+
+
+class TestTryRemoveViewCopy(unittest.TestCase):
+    def test_disable(self) -> None:
+        model = TestModel1()
+        model.eval()
+        example_inputs = model.get_example_inputs()
+        ep = torch.export.export(model, example_inputs)
+        etpm = to_edge(ep).to_executorch(
+            config=ExecutorchBackendConfig(
+                try_remove_view_copy=False,
+                memory_planning_pass=MemoryPlanningPass(
+                    "greedy", alloc_graph_input=False
+                ),
+            ),
+        )
+
+        for node in etpm.exported_program().graph_module.graph.nodes:
+            assert node.target != memory.view
+
+    def test_output_matches(self) -> None:
+        model = TestModel1()
+        model.eval()
+        example_inputs = model.get_example_inputs()
+        ep = torch.export.export(model, example_inputs)
+
+        epm_remove = to_edge(ep)
+        epm_no_remove = copy.deepcopy(
+            epm_remove
+        )  # to_executorch modifies the edge_program, so we make a copy
+
+        # Run pass with no removal
+        etpm_remove = epm_remove.to_executorch(
+            config=ExecutorchBackendConfig(
+                try_remove_view_copy=True,
+                memory_planning_pass=MemoryPlanningPass(
+                    "greedy", alloc_graph_input=False
+                ),
+            ),
+        )
+
+        # Run pass with removal
+        etpm_no_remove = epm_no_remove.to_executorch(
+            config=ExecutorchBackendConfig(
+                try_remove_view_copy=True,
+                memory_planning_pass=MemoryPlanningPass(
+                    "greedy", alloc_graph_input=False
+                ),
+            ),
+        )
+
+        out_remove = etpm_remove.exported_program().module()(*example_inputs)
+        out_no_remove = etpm_no_remove.exported_program().module()(*example_inputs)
+
+        self.assertTrue(torch.allclose(out_remove, out_no_remove))
+
+    def test_spec(self) -> None:
+        model = TestModel1()
+        model.eval()
+        example_inputs = model.get_example_inputs()
+        ep = torch.export.export(model, example_inputs)
+
+        etpm = to_edge(ep).to_executorch(
+            config=ExecutorchBackendConfig(
+                try_remove_view_copy=True,
+                memory_planning_pass=MemoryPlanningPass(
+                    "greedy", alloc_graph_input=False
+                ),
+            ),
+        )
+
+        # etpm.exported_program().graph.print_tabular()
+
+        # idx  opcode         name                      target                              args                                                kwargs
+        # ---  -------------  ------------------------  ----------------------------------  --------------------------------------------------  ----------------
+        # 0    placeholder    arg0_1                    arg0_1                              ()                                                  {}
+        # 1    placeholder    arg1_1                    arg1_1                              ()                                                  {}
+        # 2    call_function  aten_view_copy_default    <function view at 0x7f10a6dfeb00>   (arg0_1, [6, 5])                                    {}
+        # 3    call_function  alloc                     <function alloc at 0x7f10a6dfe9e0>  (((6, 5), torch.float32),)                          {}
+        # 4    call_function  aten_view_copy_default_1  aten.view_copy.out                  (arg1_1, [6, 5])                                    {'out': alloc}
+        # 5    call_function  alloc_1                   <function alloc at 0x7f10a6dfe9e0>  (((6, 5), torch.float32),)                          {}
+        # 6    call_function  aten_mul_tensor           aten.mul.out                        (aten_view_copy_default, aten_view_copy_default_1)  {'out': alloc_1}
+        # 7    call_function  aten_view_copy_default_2  <function view at 0x7f10a6dfeb00>   (aten_mul_tensor, [30])                             {}
+        # 8    output         output_1                  output                              ((aten_view_copy_default_2,),)                      {}
+
+        # arg0_1 is the parameter
+        # arg1_1 is the user input
+
+        for node in etpm.exported_program().graph.nodes:
+            if node.name == "arg0_1":
+                # arg0_1's lifetime is extended through aten_view_copy_default (memory.view) to idx 6
+                self.assertEqual(node.meta["spec"].lifetime, [0, 6])
+            elif node.name == "aten_view_copy_default":
+                # aten_view_copy_default is a memory.view of arg0_1.
+                # arg0_1 is a constant with storage, so we check that the view's storage matches the base
+
+                # assert base is arg0_1
+                self.assertEqual(node.args[0].name, "arg0_1")
+
+                # assert base is const with storage
+                self.assertTrue(node.args[0].meta["spec"].const)
+                self.assertTrue(node.args[0].meta["spec"].storage is not None)
+                self.assertTrue(node.args[0].meta["spec"].mem_id is None)
+                self.assertTrue(node.args[0].meta["spec"].mem_offset is None)
+
+                # assert self is const with storage
+                self.assertTrue(node.meta["spec"].const)
+                self.assertTrue(node.meta["spec"].storage is not None)
+                self.assertTrue(node.meta["spec"].mem_id is None)
+                self.assertTrue(node.meta["spec"].mem_offset is None)
+
+                # assert storage matches
+                self.assertEqual(
+                    node.meta["spec"].storage, node.args[0].meta["spec"].storage
+                )
+
+                # assert lifetime matches
+                self.assertEqual(
+                    node.meta["spec"].lifetime, node.args[0].meta["spec"].lifetime
+                )
+            elif node.name == "aten_mul_tensor":
+                # aten_mul_tensor's lifetime is extended through aten_view_copy_default_2 (memory.view) to idx 8
+                self.assertEqual(node.meta["spec"].lifetime, [5, 8])
+            elif node.name == "aten_view_copy_default_2":
+                # aten_view_copy_default_2 is a memory.view of aten_mul_tensor
+
+                # assert base is aten_mul_tensor
+                self.assertEqual(node.args[0].name, "aten_mul_tensor")
+
+                # assert base and self are not const, do not have storage,
+                # but do have mem_id and mem_offset
+                self.assertFalse(node.args[0].meta["spec"].const)
+                self.assertTrue(node.args[0].meta["spec"].storage is None)
+                self.assertTrue(node.args[0].meta["spec"].mem_id is not None)
+                self.assertTrue(node.args[0].meta["spec"].mem_offset is not None)
+
+                self.assertFalse(node.meta["spec"].const)
+                self.assertTrue(node.meta["spec"].storage is None)
+                self.assertTrue(node.meta["spec"].mem_id is not None)
+                self.assertTrue(node.meta["spec"].mem_offset is not None)
+
+                # assert self and base mem_id, mem_offset, and lifetime matches
+                self.assertEqual(
+                    node.meta["spec"].mem_id, node.args[0].meta["spec"].mem_id
+                )
+                self.assertEqual(
+                    node.meta["spec"].mem_offset, node.args[0].meta["spec"].mem_offset
+                )
+                self.assertEqual(
+                    node.meta["spec"].lifetime, node.args[0].meta["spec"].lifetime
+                )
+
+        # Test evalues in execution plan
+        evalues = etpm.executorch_program.execution_plan[0].values
+
+        # evalue 0 is the parameter arg0_1 and evalue 2 is view aten_view_copy_default
+        # assert their sizes are as expected and constant_buffer_idx != 0
+        self.assertEqual(evalues[0].val.sizes, [5, 6])  # pyre-ignore
+        self.assertNotEqual(evalues[0].val.constant_buffer_idx, 0)  # pyre-ignore
+        self.assertEqual(evalues[2].val.sizes, [6, 5])  # pyre-ignore
+        self.assertNotEqual(evalues[2].val.constant_buffer_idx, 0)  # pyre-ignore
+
+        # assert they have the same constant_buffer_idx
+        self.assertEqual(evalues[0].val.constant_buffer_idx, evalues[2].val.constant_buffer_idx)  # pyre-ignore
+
+        # evalue 7 is alloc_1 (aten_mul_tensor) and evalue 8 is aten_view_copy_default_2
+        # assert their sizes are as expected and constant_buffer_idx == 0
+        self.assertEqual(evalues[7].val.sizes, [6, 5])  # pyre-ignore
+        self.assertEqual(evalues[7].val.constant_buffer_idx, 0)  # pyre-ignore
+        self.assertEqual(evalues[8].val.sizes, [30])  # pyre-ignore
+        self.assertEqual(evalues[8].val.constant_buffer_idx, 0)  # pyre-ignore
+
+        # assert they have the same mem_id and mem_offset low and high
+        self.assertEqual(evalues[7].val.allocation_info.memory_id, evalues[8].val.allocation_info.memory_id)  # pyre-ignore
+        self.assertEqual(evalues[7].val.allocation_info.memory_offset_low, evalues[8].val.allocation_info.memory_offset_low)  # pyre-ignore
+        self.assertEqual(evalues[7].val.allocation_info.memory_offset_high, evalues[8].val.allocation_info.memory_offset_high)  # pyre-ignore

Original file line number	Diff line number	Diff line change
`@@ -251,6 +251,7 @@ def callWithLoggerEnabled(self, graph_module: torch.fx.GraphModule) -> None:`
`251`	`251`	`# we won't see it in the input graph to the to_out_variant pass, unless`
`252`	`252`	`# it's retraced after running to_out_variant with the first trace.`
`253`	`253`	`memory.alloc,`
	`254`	`+ memory.view,`
`254`	`255`	`executorch_call_delegate,`
`255`	`256`	`torch.ops.aten.copy_.default,`
`256`	`257`	`}`
Original file line number	Diff line number	Diff line change
`@@ -128,4 +128,5 @@ def run(`
`128`	`128`	`f"The {self.memory_planning_algo} algorithm reuses storage for {num_reuse_pairs} pair of tensors"`
`129`	`129`	`)`
`130`	`130`	`verifier.verify_graph_input_output()`
	`131`	`+ verifier.verify_memory_view_are_memory_planned()`
`131`	`132`	`return PassResult(graph_module, True)`