Skip to content

Commit c617284

Browse files
Matthias Cremonfacebook-github-bot
authored andcommitted
Use Helios' decomposition for SDPA before quantizing
Summary: As titled. This will expose the `bmm` nodes in the graph, and allow us to quantize them in a subsequent diff. Differential Revision: D59503355
1 parent c70bb27 commit c617284

File tree

2 files changed

+5
-0
lines changed

2 files changed

+5
-0
lines changed

backends/cadence/aot/TARGETS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ python_library(
3535
"//executorch/backends/cadence/aot/quantizer:fusion_pass",
3636
"//executorch/backends/cadence/aot/quantizer:quantizer",
3737
"//executorch/exir:lib",
38+
"//on_device_ai/helios/quantization:quantization",
3839
],
3940
)
4041

backends/cadence/aot/compiler.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
)
2525
from executorch.backends.cadence.aot.utils import model_is_quantized
2626
from executorch.exir import EdgeCompileConfig, EdgeProgramManager, to_edge
27+
from on_device_ai.helios.quantization.transforms import decompose_SDPA_turing
2728
from pyre_extensions import assert_is_instance
2829
from torch._export import capture_pre_autograd_graph
2930
from torch.ao.quantization.pt2e.export_utils import model_is_exported
@@ -47,6 +48,9 @@ def quantize_pt2(
4748
# Export with dynamo
4849
model_exp = capture_pre_autograd_graph(model, inputs)
4950

51+
# Decompose SDPA (grab the pass from Turing)
52+
decompose_SDPA_turing(model_exp)
53+
5054
# Prepare
5155
prepared_model = prepare_pt2e(model_exp, quantizer)
5256

0 commit comments

Comments
 (0)