Qualcomm AI Engine Direct - Optimize memory usage at runtime #7003

shewu-quic · 2024-11-21T05:25:56Z

Qnn backend doesn't need processed data after qnn_context_create_from_binary.

pytorch-bot · 2024-11-21T05:26:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7003

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit f1fe34b with merge base a39ea29 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-21T05:31:44Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shewu-quic · 2024-11-21T05:32:31Z

Hi @cccclai,

The PR is to fix the issue of memory usage.
Could you please help take a look?

Thanks,
Hutton

cccclai

Thanks! Maybe it also helps to resolve the RAM issue we ran into before for the 8b model on 16GB? I understand you already fixed it in an alternative way.

shewu-quic · 2024-11-21T05:36:11Z

I think spill fill buffer is still necessary for 8b due to the memory usage in HTP.
This change only reduced PSS not dma buffer.

Qualcomm AI Engine Direct - Optimize memory at runtime

f1fe34b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2024

shewu-quic changed the title ~~Qualcomm AI Engine Direct - Optimize memory at runtime~~ Qualcomm AI Engine Direct - Optimize memory usage at runtime Nov 21, 2024

cccclai approved these changes Nov 21, 2024

View reviewed changes

cccclai added the release notes: backends [DO NOT USE] Changes to any of the backend delegates label Nov 21, 2024

cccclai merged commit 96a9d35 into pytorch:main Nov 21, 2024
41 of 43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Optimize memory usage at runtime #7003

Qualcomm AI Engine Direct - Optimize memory usage at runtime #7003

Uh oh!

shewu-quic commented Nov 21, 2024

Uh oh!

pytorch-bot bot commented Nov 21, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Nov 21, 2024

Uh oh!

shewu-quic commented Nov 21, 2024

Uh oh!

cccclai left a comment

Uh oh!

shewu-quic commented Nov 21, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Optimize memory usage at runtime #7003

Qualcomm AI Engine Direct - Optimize memory usage at runtime #7003

Uh oh!

Conversation

shewu-quic commented Nov 21, 2024

Uh oh!

pytorch-bot bot commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7003

❗ 1 Active SEVs

✅ No Failures

Uh oh!

facebook-github-bot commented Nov 21, 2024

Uh oh!

shewu-quic commented Nov 21, 2024

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

shewu-quic commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 21, 2024 •

edited

Loading

shewu-quic commented Nov 21, 2024 •

edited

Loading