Skip to content

Commit b601b49

Browse files
kimishpatelfacebook-github-bot
authored andcommitted
Executorch][xnnpack] Hack to speedup dqlinear lowering (#1893)
Summary: Pull Request resolved: #1893 XNNPACK preprocess is very very slow due to running many transform passed. In the specific instance convert_bilinear_upsamples was taking extreme amount of time. This should really be solved with _to_edge_and_transform API, but in the meantime I really want to increase my, maybe others, iteration speed. This hack is a way to get there. Potentially in future we also allow partitioner to highlight what was partitioned so we dont run unrelated transform passes. Result: Run time buck2 run mode/opt mode/inplace executorch/examples/models/llama2:export_llama -- -c executorch/examples/models/llama2/params/demo_rand_params.pth -p executorch/examples/models/llama2/params/demo_config.json --pt2_quantize "xnnpack_dynamic" -m '{"get_bos_id": 3, "get_eos_id": 3, "get_n_bos": 1, "get_n_eos": 2}' -2 -kv Before: ~7m After: little more than a minute for profiling runtime try time buck2 run mode/opt mode/inplace executorch/examples/models/llama2:export_llama -- -c executorch/examples/models/llama2/params/demo_rand_params.pth -p executorch/examples/models/llama2/params/demo_config.json --pt2_quantize "xnnpack_dynamic" -m '{"get_bos_id": 3, "get_eos_id": 3, "get_n_bos": 1, "get_n_eos": 2}' -2 -kv -prof "llama2.html" There is still a signifcant amount of time spent in partitioning ghstack-source-id: 215452812 exported-using-ghexport Reviewed By: digantdesai, mcr229 Differential Revision: D53584078 fbshipit-source-id: a80ecb95a449e2d80c3f73453ba3ad6ad9eecb8f
1 parent 800de21 commit b601b49

File tree

2 files changed

+14
-1
lines changed

2 files changed

+14
-1
lines changed

backends/xnnpack/partition/xnnpack_partitioner.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
generate_partitions_from_list_of_nodes,
3333
generate_pattern_op_partitions,
3434
)
35+
from executorch.exir.backend.compile_spec_schema import CompileSpec
3536

3637
from executorch.exir.backend.partitioner import (
3738
DelegationSpec,
@@ -1091,6 +1092,9 @@ def partition(self, exported_program: ExportedProgram) -> PartitionResult:
10911092
for match in self.get_module_partitions(exported_program)
10921093
]
10931094
partition_tags: Dict[str, DelegationSpec] = {}
1095+
self.delegation_spec = DelegationSpec(
1096+
XnnpackBackend.__name__, [CompileSpec("dqlinear_partitioner", bytes())]
1097+
)
10941098

10951099
if self.check_partitions(partitions):
10961100
partition_tags = self.tag_nodes(partitions)

backends/xnnpack/xnnpack_preprocess.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
from executorch.backends.xnnpack.operators.node_visitor import get_node_visitors
1717

1818
from executorch.backends.xnnpack.passes import XNNPACKPassManager
19+
from executorch.backends.xnnpack.passes.convert_to_linear import ConvertToLinearPass
20+
from executorch.backends.xnnpack.passes.tag_implicit_q_dq_pass import TagImplicitQDqPass
1921

2022
from executorch.backends.xnnpack.serialization.xnnpack_graph_schema import (
2123
Buffer,
@@ -213,8 +215,15 @@ def preprocess(
213215
constants=ep.constants,
214216
)
215217

218+
passes = []
219+
for spec in compile_specs:
220+
if spec.key == "dqlinear_partitioner":
221+
passes.append(ConvertToLinearPass)
222+
passes.append(TagImplicitQDqPass)
223+
224+
passes = passes if len(passes) > 0 else None
216225
# XNNPACK Delegate Specific Passes
217-
ep = XNNPACKPassManager(ep).transform()
226+
ep = XNNPACKPassManager(ep, passes=passes).transform()
218227
graph_module = ep.graph_module
219228

220229
node_to_external_map = generate_node_to_external_map(ep, graph_module)

0 commit comments

Comments
 (0)