Qualcomm AI Engine Direct - Refine max spill fill buffer setting #5925

shewu-quic · 2024-10-07T03:16:47Z

Get required spillFillBufferSize from context binary and set to compiler_spec
Quantize embedding op in qnn.
If enable multi-contexts, maxSpillFillBuffer could not set to zero.

- Get required spillFillBufferSize from context binary and set to compiler_spec - Quantize embedding op in qnn. - If enable multi-contexts, maxSpillFillBuffer could not set to zero.

pytorch-bot · 2024-10-07T03:16:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5925

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b974e0a with merge base c06a708 ():

NEW FAILURE - The following job has failed:

trunk / test-models-macos (cmake, llama2, xnnpack-delegation, macos-m1-stable, 90) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2024-10-08T01:50:39Z

Hi @cccclai,
This PR is to refine how to set spill fill buffer and quantize the embedding op.
The purpose is to run llama3 instruct with 128 seq_len on 12GB device.
Could you please have a look on it? Thank you!

facebook-github-bot · 2024-10-08T04:53:07Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-10-08T18:02:03Z

Hey can you apply this patch to the PR? There are some lint failures.

diff --git a/backends/qualcomm/runtime/backends/QnnBackendCache.h b/backends/qualcomm/runtime/backends/QnnBackendCache.h
index c492a3d1d..074092bc1 100644
--- a/backends/qualcomm/runtime/backends/QnnBackendCache.h
+++ b/backends/qualcomm/runtime/backends/QnnBackendCache.h
@@ -24,8 +24,8 @@ class QnnBackendCache {
     ONLINE_PREPARE = 3,
   };
   explicit QnnBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
-      : qnn_context_blob_(qnn_context_blob){};
-  virtual ~QnnBackendCache();
+      : qnn_context_blob_(qnn_context_blob){}
+  virtual ~QnnBackendCache() = 0;
   QnnBackendCache(const QnnBackendCache&) = delete;
   QnnBackendCache(QnnBackendCache&&) = delete;
   QnnBackendCache& operator=(const QnnBackendCache&) = delete;
@@ -57,7 +57,7 @@ class QnnBackendCache {
   virtual Error RetrieveBackendBinaryInfo(
       const QnnSystemContext_BinaryInfo_t* binaryinfo) {
     return Error::Ok;
-  };
+  }

  private:
   Error GetQnnGraphInfoFromBinary();
diff --git a/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h b/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h
index 117704f6e..8bf20af37 100644
--- a/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h
+++ b/backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h
@@ -13,9 +13,9 @@ namespace executor {
 namespace qnn {
 class HtpBackendCache : public QnnBackendCache {
  public:
-  HtpBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
+  explicit HtpBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
       : QnnBackendCache(qnn_context_blob), spill_fill_buf_(0) {}
-  ~HtpBackendCache() {}
+  ~HtpBackendCache() = default;

   uint64_t GetSpillFillBufferSize() {
     return spill_fill_buf_;

Summary: Copy of pytorch#5925 and address the constructor/destructor issues Differential Revision: D64055108

cccclai · 2024-10-08T19:40:26Z

I cherry pick you PR to #5989 and make changes on top of it. Also rebase to solve some linter breakage in your base commit and file #5995 to resolve the another lint issue.

cccclai · 2024-10-08T21:54:25Z

backends/qualcomm/runtime/backends/htpbackend/HtpBackendCache.h

+ public:
+  HtpBackendCache(const QnnExecuTorchContextBinary& qnn_context_blob)
+      : QnnBackendCache(qnn_context_blob), spill_fill_buf_(0) {}
+  ~HtpBackendCache() {}


Do you expect HtpBackendCache do nothing?

Yes, I believe it doesn’t need to do anything further.

cccclai · 2024-10-09T01:20:17Z

#5989 is merged and your name will be the author for this commit.

shewu-quic · 2024-10-09T01:40:55Z

I cherry pick you PR to #5989 and make changes on top of it. Also rebase to solve some linter breakage in your base commit and file #5995 to resolve the another lint issue.

Thank you very much!
So, I could close this PR?

cccclai · 2024-10-09T03:00:48Z

I cherry pick you PR to #5989 and make changes on top of it. Also rebase to solve some linter breakage in your base commit and file #5995 to resolve the another lint issue.

Thank you very much! So, I could close this PR?

Yeah this PR can be closed.

Qualcomm AI Engine Direct - Refine max spill fill buffer setting

b974e0a

- Get required spillFillBufferSize from context binary and set to compiler_spec - Quantize embedding op in qnn. - If enable multi-contexts, maxSpillFillBuffer could not set to zero.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024

cccclai approved these changes Oct 8, 2024

View reviewed changes

cccclai mentioned this pull request Oct 8, 2024

Qualcomm AI Engine Direct - Support topk #5870

Closed

shoumikhin approved these changes Oct 8, 2024

View reviewed changes

cccclai added the ciflow/trunk label Oct 8, 2024

cccclai mentioned this pull request Oct 8, 2024

Copy PR 5925 with patch #5988

Closed

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Oct 8, 2024

Copy PR 5925 with patch

9511ae9

Summary: Copy of pytorch#5925 and address the constructor/destructor issues Differential Revision: D64055108

cccclai reviewed Oct 8, 2024

View reviewed changes

shewu-quic closed this Oct 9, 2024

cccclai mentioned this pull request Oct 9, 2024

[v0.4.0] Release Tracker #5366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Refine max spill fill buffer setting #5925

Qualcomm AI Engine Direct - Refine max spill fill buffer setting #5925

Uh oh!

shewu-quic commented Oct 7, 2024

Uh oh!

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading

Uh oh!

shewu-quic commented Oct 8, 2024

Uh oh!

facebook-github-bot commented Oct 8, 2024

Uh oh!

cccclai commented Oct 8, 2024

Uh oh!

cccclai commented Oct 8, 2024

Uh oh!

cccclai Oct 8, 2024

Uh oh!

shewu-quic Oct 9, 2024

Uh oh!

cccclai commented Oct 9, 2024

Uh oh!

shewu-quic commented Oct 9, 2024 •

edited

Loading

Uh oh!

cccclai commented Oct 9, 2024

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Refine max spill fill buffer setting #5925

Qualcomm AI Engine Direct - Refine max spill fill buffer setting #5925

Uh oh!

Conversation

shewu-quic commented Oct 7, 2024

Uh oh!

pytorch-bot bot commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5925

❌ 1 New Failure

Uh oh!

shewu-quic commented Oct 8, 2024

Uh oh!

facebook-github-bot commented Oct 8, 2024

Uh oh!

cccclai commented Oct 8, 2024

Uh oh!

cccclai commented Oct 8, 2024

Uh oh!

cccclai Oct 8, 2024

Choose a reason for hiding this comment

Uh oh!

shewu-quic Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

cccclai commented Oct 9, 2024

Uh oh!

shewu-quic commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai commented Oct 9, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading

shewu-quic commented Oct 9, 2024 •

edited

Loading