Skip to content

ModAI changes to export xnnpack delegated non_lowered_server_model #10989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 20, 2025

Conversation

tarun292
Copy link
Contributor

Summary:
This adds a xnnpack delegated model to non_lowered_server_model. This will help in speeding up server evals in aten mode by delegating to XNNPack. We run a const_prop_pass before running the delegation because this will help get rid of some unnecessary q=>dq patterns that will slow the model down.

Improvements seen for some models inference time are:
MLD model ~900ms=>450ms
OFI model ~450ms => 230ms

Reviewed By: navsud

Differential Revision: D70704201

Copy link

pytorch-bot bot commented May 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10989

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0ad1f99 with merge base b73f9d5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 19, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70704201

@tarun292 tarun292 added topic: not user facing release notes: none Do not include this in the release notes labels May 19, 2025
facebook-github-bot pushed a commit that referenced this pull request May 19, 2025
…10989)

Summary:

This adds a xnnpack delegated model to non_lowered_server_model. This will help in speeding up server evals in aten mode by delegating to XNNPack. We run a const_prop_pass before running the delegation because this will help get rid of some unnecessary q=>dq patterns that will slow the model down.

Improvements seen for some models inference time are:
MLD model ~900ms=>450ms
OFI model ~450ms => 230ms

Reviewed By: navsud

Differential Revision: D70704201
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70704201

@YIWENX14 YIWENX14 self-requested a review May 20, 2025 04:29
…10989)

Summary:

This adds a xnnpack delegated model to non_lowered_server_model. This will help in speeding up server evals in aten mode by delegating to XNNPack. We run a const_prop_pass before running the delegation because this will help get rid of some unnecessary q=>dq patterns that will slow the model down.

Improvements seen for some models inference time are:
MLD model ~900ms=>450ms
OFI model ~450ms => 230ms

Reviewed By: navsud, YIWENX14, Gasoonjia

Differential Revision: D70704201
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70704201

@facebook-github-bot facebook-github-bot merged commit 6b48e89 into main May 20, 2025
87 of 90 checks passed
@facebook-github-bot facebook-github-bot deleted the export-D70704201 branch May 20, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: none Do not include this in the release notes topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants