Executorch][xnnpack] Hack to speedup dqlinear lowering (#1893)

kimishpatel · facebook-github-bot · commit b601b49b21de · 2024-02-16T09:44:16.000-08:00
Summary: Pull Request resolved: #1893 XNNPACK preprocess is very very slow due to running many transform passed. In the specific instance convert_bilinear_upsamples was taking extreme amount of time. This should really be solved with _to_edge_and_transform API, but in the meantime I really want to increase my, maybe others, iteration speed. This hack is a way to get there. Potentially in future we also allow partitioner to highlight what was partitioned so we dont run unrelated transform passes. Result: Run time buck2 run mode/opt mode/inplace executorch/examples/models/llama2:export_llama -- -c executorch/examples/models/llama2/params/demo_rand_params.pth -p executorch/examples/models/llama2/params/demo_config.json --pt2_quantize "xnnpack_dynamic" -m '{"get_bos_id": 3, "get_eos_id": 3, "get_n_bos": 1, "get_n_eos": 2}' -2 -kv Before: ~7m After: little more than a minute for profiling runtime try time buck2 run mode/opt mode/inplace executorch/examples/models/llama2:export_llama -- -c executorch/examples/models/llama2/params/demo_rand_params.pth -p executorch/examples/models/llama2/params/demo_config.json --pt2_quantize "xnnpack_dynamic" -m '{"get_bos_id": 3, "get_eos_id": 3, "get_n_bos": 1, "get_n_eos": 2}' -2 -kv -prof "llama2.html" There is still a signifcant amount of time spent in partitioning ghstack-source-id: 215452812 exported-using-ghexport Reviewed By: digantdesai, mcr229 Differential Revision: D53584078 fbshipit-source-id: a80ecb95a449e2d80c3f73453ba3ad6ad9eecb8f
diff --git a/backends/xnnpack/partition/xnnpack_partitioner.py b/backends/xnnpack/partition/xnnpack_partitioner.py
@@ -32,6 +32,7 @@
     generate_partitions_from_list_of_nodes,
     generate_pattern_op_partitions,
 )
+from executorch.exir.backend.compile_spec_schema import CompileSpec
 
 from executorch.exir.backend.partitioner import (
     DelegationSpec,
@@ -1091,6 +1092,9 @@ def partition(self, exported_program: ExportedProgram) -> PartitionResult:
             for match in self.get_module_partitions(exported_program)
         ]
         partition_tags: Dict[str, DelegationSpec] = {}
+        self.delegation_spec = DelegationSpec(
+            XnnpackBackend.__name__, [CompileSpec("dqlinear_partitioner", bytes())]
+        )
 
         if self.check_partitions(partitions):
             partition_tags = self.tag_nodes(partitions)
diff --git a/backends/xnnpack/xnnpack_preprocess.py b/backends/xnnpack/xnnpack_preprocess.py
@@ -16,6 +16,8 @@
 from executorch.backends.xnnpack.operators.node_visitor import get_node_visitors
 
 from executorch.backends.xnnpack.passes import XNNPACKPassManager
+from executorch.backends.xnnpack.passes.convert_to_linear import ConvertToLinearPass
+from executorch.backends.xnnpack.passes.tag_implicit_q_dq_pass import TagImplicitQDqPass
 
 from executorch.backends.xnnpack.serialization.xnnpack_graph_schema import (
     Buffer,
@@ -213,8 +215,15 @@ def preprocess(
             constants=ep.constants,
         )
 
+        passes = []
+        for spec in compile_specs:
+            if spec.key == "dqlinear_partitioner":
+                passes.append(ConvertToLinearPass)
+                passes.append(TagImplicitQDqPass)
+
+        passes = passes if len(passes) > 0 else None
         # XNNPACK Delegate Specific Passes
-        ep = XNNPACKPassManager(ep).transform()
+        ep = XNNPACKPassManager(ep, passes=passes).transform()
         graph_module = ep.graph_module
 
         node_to_external_map = generate_node_to_external_map(ep, graph_module)