-
Notifications
You must be signed in to change notification settings - Fork 608
ModAI changes to export xnnpack delegated non_lowered_server_model #10989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10989
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0ad1f99 with merge base b73f9d5 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D70704201 |
…10989) Summary: This adds a xnnpack delegated model to non_lowered_server_model. This will help in speeding up server evals in aten mode by delegating to XNNPack. We run a const_prop_pass before running the delegation because this will help get rid of some unnecessary q=>dq patterns that will slow the model down. Improvements seen for some models inference time are: MLD model ~900ms=>450ms OFI model ~450ms => 230ms Reviewed By: navsud Differential Revision: D70704201
95af8c4
to
aa7a1d4
Compare
This pull request was exported from Phabricator. Differential Revision: D70704201 |
…10989) Summary: This adds a xnnpack delegated model to non_lowered_server_model. This will help in speeding up server evals in aten mode by delegating to XNNPack. We run a const_prop_pass before running the delegation because this will help get rid of some unnecessary q=>dq patterns that will slow the model down. Improvements seen for some models inference time are: MLD model ~900ms=>450ms OFI model ~450ms => 230ms Reviewed By: navsud, YIWENX14, Gasoonjia Differential Revision: D70704201
aa7a1d4
to
0ad1f99
Compare
This pull request was exported from Phabricator. Differential Revision: D70704201 |
Summary:
This adds a xnnpack delegated model to non_lowered_server_model. This will help in speeding up server evals in aten mode by delegating to XNNPack. We run a const_prop_pass before running the delegation because this will help get rid of some unnecessary q=>dq patterns that will slow the model down.
Improvements seen for some models inference time are:
MLD model ~900ms=>450ms
OFI model ~450ms => 230ms
Reviewed By: navsud
Differential Revision: D70704201