-
Notifications
You must be signed in to change notification settings - Fork 608
Run decompositions before the quantizer #7111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7111
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a11afab with merge base 2d499b3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D66461406 |
|
||
if model_gm_has_SDPA(model_gm): # pyre-fixme[6] | ||
decomp_table = torch.export.default_decompositions() | ||
ops_to_keep = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to leave the same comment inline
torch.ops.aten.linear.default, | ||
torch.ops.aten.matmul.default, | ||
] | ||
# pyre-fixme[6]: For 1st argument expected `Dict[typing.Callable[..., typing.Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyre are disabled in ET but still used internally. We need to sort it out but not in this PR
Summary: In the current flow, decompositions run in `to_edge()`, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others). Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, like `aten.linear`, `aten.conv1d` and `aten.conv2d`. Reviewed By: zonglinpeng Differential Revision: D66461406
cc66aa4
to
81c7522
Compare
This pull request was exported from Phabricator. Differential Revision: D66461406 |
Summary: In the current flow, decompositions run in `to_edge()`, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others). Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, like `aten.linear`, `aten.conv1d` and `aten.conv2d`. Reviewed By: zonglinpeng Differential Revision: D66461406
81c7522
to
a11afab
Compare
This pull request was exported from Phabricator. Differential Revision: D66461406 |
Summary:
In the current flow, decompositions run in
to_edge()
, long after the quantization process is done. This creates a lot of issues, since we cannot quantize any operations contained in the large operators that the graph tracer can give (e.g. aten.scaled_dot_product_attention, aten.rnn_<tanh, relu>.input, and a few others).Any models using those will see many fp32 operators in the final graph. Running the decomps earlier solves the problem, but we need to retain a couple operators that we do rely on in the quantizer, namely
aten.linear
,aten.conv1d
andaten.conv2d
.Differential Revision: D66461406