Qualcomm AI Engine Direct - enable loading context binary directly #4163

haowhsu-quic · 2024-07-08T08:06:35Z

Summary:

add utilities for loading context binary generated from qnn tools
align env variable naming with qnn
fix bug in online prepare and extend coverage to support bitwise quatization
llama7b e2e example from qualcomm ai_hub
minor fixes for syle & typo

pytorch-bot · 2024-07-08T08:06:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4163

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9878eb8 with merge base 740a0a5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

exir/lowered_backend_module.py

examples/qualcomm/executor_runner/qnn_qaihub_llama_runner.cpp

examples/qualcomm/llama2/README.md

backends/qualcomm/utils/utils.py

facebook-github-bot · 2024-07-14T23:16:54Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

swolchok

feel free to ping me for re-review after updating

backends/qualcomm/aot/ir/qcir_utils.cpp

swolchok · 2024-07-15T16:35:17Z

backends/qualcomm/aot/ir/qcir_utils.cpp

+      p.bwAxisScaleOffsetEncoding.scales = reinterpret_cast<float*>(
+          const_cast<uint8_t*>(param->scales()->Data()));
+      p.bwAxisScaleOffsetEncoding.offsets = reinterpret_cast<int32_t*>(
+          const_cast<uint8_t*>(param->offsets()->Data()));


why not just call data() directly instead of reinterpreting? https://github.com/google/flatbuffers/blob/master/include/flatbuffers/array.h#L101

Nice, but const qualifier would be intended kept for checking instead of using mutable_xxx to get pointers.

swolchok · 2024-07-15T16:41:40Z

backends/qualcomm/runtime/QnnManager.cpp

+    std::shared_ptr<TensorWrapper> tensor_wrapper =
+        CreateTensorWrapper(output_tensors[i]);
+    tensor_wrapper->UpdateQnnTensorMeta(output_tensors[i]);
+    std::string tensor_name = tensor_wrapper->GetName();


why does TensorWrapper::GetName() return a copy of the string instead of a reference? (I was going to correct this to const auto& tensor_name = tensor_wrapper->GetName())

Nice, didn't notice this.

swolchok · 2024-07-15T16:43:51Z

backends/qualcomm/tests/test_qnn_delegate.py

+                module,
+                tuple(
+                    torch.randn(size=v.shape, dtype=v.dtype)
+                    for _, v in bundle_program["inputs"].items()


is bundle_program["inputs"] a dict? if so, why not for v in bundle_program["inputs"].values()?

Thanks for pointing out, will change for all dictionary related comments.

swolchok · 2024-07-15T16:43:58Z

backends/qualcomm/tests/test_qnn_delegate.py

+                module,
+                tuple(
+                    torch.randn(size=v.shape, dtype=v.dtype)
+                    for _, v in bundle_program["inputs"].items()


swolchok · 2024-07-15T17:35:25Z

examples/qualcomm/llama2/qaihub_runner/io_memory.cpp

+    for (int i = 0; i < 8; ++i) { // layers per shard
+      for (int j = 0; j < 2; ++j) { // k_cache + v_cache
+        for (int k = 0; k < 32; ++k) { // heads


might as well make some named constants for these and then you don't need the comments here and below

Changed, thanks.

swolchok · 2024-07-15T17:36:09Z

examples/qualcomm/llama2/qaihub_runner/io_memory.cpp

+    for (int i = 0; i < 8; ++i) { // layers per shard
+      for (int j = 0; j < 2; ++j) { // k_cache + v_cache
+        for (int k = 0; k < 32; ++k) { // heads


ditto named constants

swolchok · 2024-07-15T17:38:16Z

examples/qualcomm/llama2/qaihub_runner/io_memory.cpp

+        cv_.wait(lock, [this] { return !jobs_.empty() || quit_; });
+
+        if (quit_ && jobs_.empty())


not sure on the semantics of quit_ here; I would have guessed that quit_ would mean to stop even if not all the jobs are done

Change to stop_.

swolchok · 2024-07-15T17:40:47Z

examples/qualcomm/llama2/qaihub_runner/runner.cpp

+  for (int i = 0; i < vocab_size_; i += 4) {
+    const uint16_t* in = logits + i;
+    float* out = logits_f.data() + i;
+    int32x4_t q = {in[0], in[1], in[2], in[3]};


I would recommend issuing a vectorized load and then vcvtq_s32_u16 here because this probably does a bunch of slow fmovs, though I haven't checked generated assembly

Thanks for this, I think there has no vcvtq_u16_s32 intrinsic. Change code a little bit to omit fmov (verified with compiler explorer).

swolchok · 2024-07-15T17:42:13Z

examples/qualcomm/llama2/qaihub_runner/runner.cpp

+}
+
+std::vector<Result<MethodMeta>> Runner::get_methods_meta() {
+  std::vector<Result<MethodMeta>> methods_meta;


Thanks, changed.

haowhsu-quic · 2024-07-16T13:32:48Z

Hi @swolchok, thank you so much for great comments. I apply all the reviews but do not reply every conversation.
Please take a look, thanks for your time.

swolchok

looking good! re:unique_ptr deleter, it looks like a function pointer works fine: https://godbolt.org/z/nbfja83oY

cccclai · 2024-07-16T20:20:01Z

Mind rebase? Seems like there is land race

Summary: - add utilities for loading context binary generated from qnn tools - align env variable naming with qnn - fix bug in online prepare and extend coverage to support bitwise quatization - llama7b e2e example from qualcomm ai_hub - minor fixes for syle & typo

haowhsu-quic · 2024-07-17T02:40:17Z

looking good! re:unique_ptr deleter, it looks like a function pointer works fine: https://godbolt.org/z/nbfja83oY

Updated, thank you for the hint.

facebook-github-bot · 2024-07-17T21:12:55Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-07-17T22:05:24Z

@cccclai merged this pull request in 1b0bf1c.

a21550 · 2024-08-07T14:34:45Z

I noticed that Llama 3 is available on https://huggingface.co/qualcomm/Llama-v3-8B-Chat. Any plan to add Llama 3 support through llama_qaihub.py?

haowhsu-quic · 2024-08-07T15:27:47Z

I noticed that Llama 3 is available on https://huggingface.co/qualcomm/Llama-v3-8B-Chat. Any plan to add Llama 3 support through llama_qaihub.py?

We're working on it. Will reply on this thread once PR is ready.
Thank you.

haowhsu-quic · 2024-08-20T08:18:12Z

Hi @a21550, llama3 8b is ready for trial on #4789. Please run on a mobile device with at least 12GB ram.

Disable the low memory killer if process got killed:

# conducting with a fresh status
adb -s $DEVICE_SERIAL reboot
adb -s $DEVICE_SERIAL root
adb -s $DEVICE_SERIAL shell
# type following on device to disable low memory killer
cd /sys/devices/system/memory
for i in $(ls | grep memory); do echo 0 > $i/online; done
for i in $(ls | grep memory); do echo online_kernel > $i/state; done

a21550 · 2024-08-20T15:42:12Z

Congratulations for making Llama 3 work!!!

I will play it around and let you know!

Hi @a21550, llama3 8b is ready for trial on #4789. Please run on a mobile device with at least 12GB ram.

Disable the low memory killer if process got killed:

# conducting with a fresh status
adb -s $DEVICE_SERIAL reboot
adb -s $DEVICE_SERIAL root
adb -s $DEVICE_SERIAL shell
# type following on device to disable low memory killer
cd /sys/devices/system/memory
for i in $(ls | grep memory); do echo 0 > $i/online; done
for i in $(ls | grep memory); do echo online_kernel > $i/state; done

github-actions · 2025-01-21T18:37:45Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 8, 2024

haowhsu-quic mentioned this pull request Jul 8, 2024

encode delegate binaries for perf enhancement #4164

Closed

cccclai reviewed Jul 8, 2024

View reviewed changes

haowhsu-quic force-pushed the dev_ctx_bin branch from b655ce1 to 42551da Compare July 14, 2024 01:01

cccclai approved these changes Jul 14, 2024

View reviewed changes

swolchok requested changes Jul 15, 2024

View reviewed changes

swolchok approved these changes Jul 16, 2024

View reviewed changes

haowhsu-quic and others added 3 commits July 16, 2024 19:29

apply review comments #1

c80c123

replace custom deleter with func ptr

9878eb8

haowhsu-quic force-pushed the dev_ctx_bin branch from 8cd43ed to 9878eb8 Compare July 17, 2024 02:39

facebook-github-bot closed this in 1b0bf1c Jul 17, 2024

facebook-github-bot added the Merged label Jul 17, 2024

sxu removed the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025

sxu added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025

		cv_.wait(lock, [this] { return !jobs_.empty() \|\| quit_; });

		if (quit_ && jobs_.empty())

Qualcomm AI Engine Direct - enable loading context binary directly #4163

Qualcomm AI Engine Direct - enable loading context binary directly #4163

Uh oh!

Conversation

haowhsu-quic commented Jul 8, 2024

Uh oh!

pytorch-bot bot commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4163

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 14, 2024

Uh oh!

swolchok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haowhsu-quic commented Jul 16, 2024

Uh oh!

swolchok left a comment

Choose a reason for hiding this comment

Uh oh!

cccclai commented Jul 16, 2024

Uh oh!

haowhsu-quic commented Jul 17, 2024

Uh oh!

facebook-github-bot commented Jul 17, 2024

Uh oh!

facebook-github-bot commented Jul 17, 2024

Uh oh!

a21550 commented Aug 7, 2024

Uh oh!

haowhsu-quic commented Aug 7, 2024

Uh oh!

haowhsu-quic commented Aug 20, 2024

Uh oh!

a21550 commented Aug 20, 2024

Uh oh!

github-actions bot commented Jan 21, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 8, 2024 •

edited

Loading

This PR needs a `release notes:` label