Run decompositions before the quantizer #7111

mcremon-meta · 2024-11-27T18:35:31Z

Summary:
In the current flow, decompositions run in to_edge(), long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others).
Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, namely aten.linear, aten.conv1d and aten.conv2d.

Differential Revision: D66461406

pytorch-bot · 2024-11-27T18:35:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7111

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a11afab with merge base 2d499b3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-27T18:35:54Z

This pull request was exported from Phabricator. Differential Revision: D66461406

zonglinpeng · 2024-12-02T16:53:55Z

backends/cadence/aot/compiler.py

-
-    if model_gm_has_SDPA(model_gm):  # pyre-fixme[6]
+    decomp_table = torch.export.default_decompositions()
+    ops_to_keep = [


would be nice to leave the same comment inline

zonglinpeng · 2024-12-02T16:54:50Z

backends/cadence/aot/compiler.py

+        torch.ops.aten.linear.default,
+        torch.ops.aten.matmul.default,
+    ]
+    # pyre-fixme[6]: For 1st argument expected `Dict[typing.Callable[..., typing.Any


pyre are disabled in ET but still used internally. We need to sort it out but not in this PR

Summary: In the current flow, decompositions run in `to_edge()`, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others). Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, like `aten.linear`, `aten.conv1d` and `aten.conv2d`. Reviewed By: zonglinpeng Differential Revision: D66461406

facebook-github-bot · 2024-12-02T17:10:38Z

This pull request was exported from Phabricator. Differential Revision: D66461406

Summary: In the current flow, decompositions run in `to_edge()`, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others). Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, like `aten.linear`, `aten.conv1d` and `aten.conv2d`. Reviewed By: zonglinpeng Differential Revision: D66461406

facebook-github-bot · 2024-12-02T17:15:34Z

This pull request was exported from Phabricator. Differential Revision: D66461406

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 27, 2024

facebook-github-bot added the fb-exported label Nov 27, 2024

mcremon-meta added the topic: not user facing label Nov 27, 2024

zonglinpeng approved these changes Dec 2, 2024

View reviewed changes

facebook-github-bot force-pushed the export-D66461406 branch from cc66aa4 to 81c7522 Compare December 2, 2024 17:09

mcremon-meta force-pushed the export-D66461406 branch from 81c7522 to a11afab Compare December 2, 2024 17:15

facebook-github-bot merged commit 0a12e33 into main Dec 2, 2024
42 of 43 checks passed

facebook-github-bot deleted the export-D66461406 branch December 2, 2024 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run decompositions before the quantizer #7111

Run decompositions before the quantizer #7111

Uh oh!

mcremon-meta commented Nov 27, 2024

Uh oh!

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Nov 27, 2024

Uh oh!

zonglinpeng Dec 2, 2024

Uh oh!

zonglinpeng Dec 2, 2024

Uh oh!

facebook-github-bot commented Dec 2, 2024

Uh oh!

facebook-github-bot commented Dec 2, 2024

Uh oh!

Uh oh!

Uh oh!

Run decompositions before the quantizer #7111

Run decompositions before the quantizer #7111

Uh oh!

Conversation

mcremon-meta commented Nov 27, 2024

Uh oh!

pytorch-bot bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7111

✅ No Failures

Uh oh!

facebook-github-bot commented Nov 27, 2024

Uh oh!

zonglinpeng Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

zonglinpeng Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Dec 2, 2024

Uh oh!

facebook-github-bot commented Dec 2, 2024

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading