Qualcomm AI Engine Direct - Support Llama3 QAIHub #4789

winskuo-quic · 2024-08-20T08:05:31Z

Summary:

Llama3 8B e2e example from qualcomm aihub
Minor file restructure and typo fix

pytorch-bot · 2024-08-20T08:05:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4789

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9a3f302 with merge base 447dc6c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

winskuo-quic · 2024-08-20T08:11:12Z

Hi @cccclai,
This PR is to enable Llama3 8b for Qualcomm AIHub context binaries.
Please have a look.
Thanks!

cccclai

Thanks - this is great! Wonder if we have reference latency/accuracy/ram usage number?

cccclai · 2024-08-20T15:38:51Z

examples/qualcomm/qaihub_scripts/llama/llama3/qaihub_llama3_8b.py

+        # TODO: QNN seems to have an expected spill fill size that can be found through log.
+        # Find a way to set this value instead of manually go through the log to retrieve the value.
+        custom_spill_fill = 128974848 if args.use_prompt_processor else 3932160
+        # setup spill-fill buffer for relieving runtime memory usage


Do we know the ram usage and latency number?

Hi @cccclai ,
Thanks a lot for reviewing the PR.

Latency: We are currently testing this on our engineering device where we can get 13tok/sec for KV Cache mode and 0.8tok/sec for Bert mode.

Accuracy: We currently don't have specific accuracy metrics like perplexity available. We tested with examples such as
Prompt: "What is baseball?"
Response: "It is a game of skill, strategy, and physical ability. It is played by two teams, each with nine players. The game is played on a diamond-shaped field, with a pitcher's mound at one corner. The objective of the game is to score more runs than the opposing team by hitting the ball with a bat and running around the four bases on the field. The team with the most runs at the end of nine innings wins the game. The game of baseball is a classic and timeless game that is enjoyed by people all over the world. It is a game that is easy to learn but difficult to"

Memory: The memory usage for both KV Cache Mode and Bert Mode are both around 11GB. We have tested on a 16GB engineering device and verified to work.

Please let me know if you have any other questions.
Thanks!

BTW, you mentioned the memory usage for you was around 11GB. I used both "top" and "dumpsys" to check the physical RAM usage, and both showed 5GB for me on SM8650 as below. Were you using a different way to check RAM usage? I'm assuming context size is 1024 as well for your testing?

=======================================================================

Tasks: 838 total, 2 running, 836 sleeping, 0 stopped, 0 zombie
Mem: 15267M total, 15129M used, 138M free, 1M buffers
Swap: 9924M total, 756M used, 9167M free, 6231M cached
800%cpu 56%user 0%nice 41%sys 696%idle 0%iow 7%irq 1%sirq 0%host
PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS
25091 shell 20 0 22G 4.8G 4.7G R 72.0 32.5 1:18.94 qaihub_llama3_8b_runner --sharded_1_path qaihub_llama3_8b_token_0.pte --sharded_2_path qaihub_llama3_8b_token_1.pte --sharded_3_path qaihub+

=======================================================================

a21550@a21550:Works$ adb shell dumpsys meminfo 25091
Applications Memory Usage (in Kilobytes):
Uptime: 102398639 Realtime: 102398639
Pss Private Private Swap Rss Heap Heap Heap
Total Dirty Clean Dirty Total Size Alloc Free
------ ------ ------ ------ ------ ------ ------ ------
Native Heap 97688 97688 0 0 97688 0 0 0
Dalvik Heap 0 0 0 0 0 0 0 0
Stack 392 392 0 0 392
Other dev 4 0 4 0 336
.so mmap 3852 356 3364 0 6720
Other mmap 4984258 76 4984180 0 4984820
Unknown 1456 1456 0 0 1456
TOTAL 5087650 99968 4987548 0 5091412 0 0 0

App Summary
Pss(KB) Rss(KB)
------ ------
Java Heap: 0 0
Native Heap: 97688 97688
Code: 3720 6720
Stack: 392 392
Graphics: 0 0
Private Other: 4985716
System: 134
Unknown: 4986612

TOTAL PSS: 5087650 TOTAL RSS: 5091412 TOTAL SWAP (KB): 0

=======================================================================

Hi @a21550,
I am using top to check memory usage, and I think we are getting similar results. However, the way I am checking the memory consumption is to execute the runner and ensure no other apps are running the same time, and then I check the Mem usage, which went from:

Mem: 15185M total, 5179M used, 10005M free, 14M buffers
Swap: 6143M total, 0M used, 6143M free, 2329M cached

to

Mem: 15185M total, 14901M used, 283M free, 2M buffers
Swap: 6143M total, 622M used, 5521M free, 6230M cached

which is around 11GB memory usage.

a21550 · 2024-08-20T15:48:19Z

Great job to fix those hardcoding in io_memory.cpp and runner.cpp, which will make it easier to add new models later!

On my own branch, I just created a config.h for each individual model, and then put all the model specific configs into it.

winskuo-quic · 2024-08-21T02:35:06Z

Great job to fix those hardcoding in io_memory.cpp and runner.cpp, which will make it easier to add new models later!

On my own branch, I just created a config.h for each individual model, and then put all the model specific configs into it.

Hi @a21550 ,
Thank you so much for sharing your approach on how to organize these configurations!

a21550 · 2024-08-21T20:37:50Z

This patch worked smoothly for me! Thank you for all the wonderful works!

BTW, because of the following patch, the artifact path in README.md may need a tiny update.

7b27f9b

I noticed that if I passed "--seq_len 1024" to qaihub_llama3_8b.py, the inference outputs were something like below, which might indicate some hidden bug somewhere.

============================================
Who are the top 3 composers of all time? [closed]
This question is based on the assumption that the best composers are those who have had the most significant influence on the course of music history.

Based on this assumption, the top 3 composers of all time are:

Ludwig van Beethoven (1770-1827) - Arguably the most influential composer of all time, Beethoven's innovative and expressive music had a profound impact on the development of classical music.
Wolfgang Amadeus Mozart (1756-1791) - A child prodigy and one of the most influential and beloved composers of all time, Mozart's music is renowned for its melodic beauty, harmonic richness, and dramatic power.
Johann Sebastian Bach (1685-1750) - A towering figure in the world of classical music, Bach's contributions to the development of Western music are immeasurable. His music is renowned for its contrapuntal mastery, harmonic innovation, and emotional depth.

These three composers - Beethoven, Mozart, and Bach - are generally considered to be among the most important and influential composers in the history of Western classical music. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]