Skip to content

Qualcomm AI Engine Direct - Support Llama3 QAIHub #4789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 20, 2024

Conversation

winskuo-quic
Copy link
Collaborator

Summary:

  • Llama3 8B e2e example from qualcomm aihub
  • Minor file restructure and typo fix

Copy link

pytorch-bot bot commented Aug 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4789

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9a3f302 with merge base 447dc6c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 20, 2024
@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/qaihub_llama3 branch from d6fd817 to 9a3f302 Compare August 20, 2024 08:09
@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is to enable Llama3 8b for Qualcomm AIHub context binaries.
Please have a look.
Thanks!

@digantdesai digantdesai added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Aug 20, 2024
@digantdesai digantdesai requested a review from cccclai August 20, 2024 15:07
Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this is great! Wonder if we have reference latency/accuracy/ram usage number?

# TODO: QNN seems to have an expected spill fill size that can be found through log.
# Find a way to set this value instead of manually go through the log to retrieve the value.
custom_spill_fill = 128974848 if args.use_prompt_processor else 3932160
# setup spill-fill buffer for relieving runtime memory usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the ram usage and latency number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cccclai ,
Thanks a lot for reviewing the PR.

  • Latency: We are currently testing this on our engineering device where we can get 13tok/sec for KV Cache mode and 0.8tok/sec for Bert mode.

  • Accuracy: We currently don't have specific accuracy metrics like perplexity available. We tested with examples such as
    Prompt: "What is baseball?"
    Response: "It is a game of skill, strategy, and physical ability. It is played by two teams, each with nine players. The game is played on a diamond-shaped field, with a pitcher's mound at one corner. The objective of the game is to score more runs than the opposing team by hitting the ball with a bat and running around the four bases on the field. The team with the most runs at the end of nine innings wins the game. The game of baseball is a classic and timeless game that is enjoyed by people all over the world. It is a game that is easy to learn but difficult to"

  • Memory: The memory usage for both KV Cache Mode and Bert Mode are both around 11GB. We have tested on a 16GB engineering device and verified to work.

Please let me know if you have any other questions.
Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you mentioned the memory usage for you was around 11GB. I used both "top" and "dumpsys" to check the physical RAM usage, and both showed 5GB for me on SM8650 as below. Were you using a different way to check RAM usage? I'm assuming context size is 1024 as well for your testing?

=======================================================================

Tasks: 838 total, 2 running, 836 sleeping, 0 stopped, 0 zombie
Mem: 15267M total, 15129M used, 138M free, 1M buffers
Swap: 9924M total, 756M used, 9167M free, 6231M cached
800%cpu 56%user 0%nice 41%sys 696%idle 0%iow 7%irq 1%sirq 0%host
PID USER PR NI VIRT RES SHR S[%CPU] %MEM TIME+ ARGS
25091 shell 20 0 22G 4.8G 4.7G R 72.0 32.5 1:18.94 qaihub_llama3_8b_runner --sharded_1_path qaihub_llama3_8b_token_0.pte --sharded_2_path qaihub_llama3_8b_token_1.pte --sharded_3_path qaihub+

=======================================================================

a21550@a21550:Works$ adb shell dumpsys meminfo 25091
Applications Memory Usage (in Kilobytes):
Uptime: 102398639 Realtime: 102398639
Pss Private Private Swap Rss Heap Heap Heap
Total Dirty Clean Dirty Total Size Alloc Free
------ ------ ------ ------ ------ ------ ------ ------
Native Heap 97688 97688 0 0 97688 0 0 0
Dalvik Heap 0 0 0 0 0 0 0 0
Stack 392 392 0 0 392
Other dev 4 0 4 0 336
.so mmap 3852 356 3364 0 6720
Other mmap 4984258 76 4984180 0 4984820
Unknown 1456 1456 0 0 1456
TOTAL 5087650 99968 4987548 0 5091412 0 0 0

App Summary
Pss(KB) Rss(KB)
------ ------
Java Heap: 0 0
Native Heap: 97688 97688
Code: 3720 6720
Stack: 392 392
Graphics: 0 0
Private Other: 4985716
System: 134
Unknown: 4986612

       TOTAL PSS:  5087650            TOTAL RSS:  5091412      TOTAL SWAP (KB):        0

=======================================================================

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @a21550,
I am using top to check memory usage, and I think we are getting similar results. However, the way I am checking the memory consumption is to execute the runner and ensure no other apps are running the same time, and then I check the Mem usage, which went from:

Mem: 15185M total, 5179M used, 10005M free, 14M buffers
Swap: 6143M total, 0M used, 6143M free, 2329M cached

to

Mem: 15185M total, 14901M used, 283M free, 2M buffers
Swap: 6143M total, 622M used, 5521M free, 6230M cached

which is around 11GB memory usage.

@a21550
Copy link

a21550 commented Aug 20, 2024

Great job to fix those hardcoding in io_memory.cpp and runner.cpp, which will make it easier to add new models later!

On my own branch, I just created a config.h for each individual model, and then put all the model specific configs into it.

@kirklandsign kirklandsign merged commit 80b4a72 into pytorch:main Aug 20, 2024
35 checks passed
@winskuo-quic
Copy link
Collaborator Author

Great job to fix those hardcoding in io_memory.cpp and runner.cpp, which will make it easier to add new models later!

On my own branch, I just created a config.h for each individual model, and then put all the model specific configs into it.

Hi @a21550 ,
Thank you so much for sharing your approach on how to organize these configurations!

@a21550
Copy link

a21550 commented Aug 21, 2024

This patch worked smoothly for me! Thank you for all the wonderful works!

BTW, because of the following patch, the artifact path in README.md may need a tiny update.

7b27f9b

I noticed that if I passed "--seq_len 1024" to qaihub_llama3_8b.py, the inference outputs were something like below, which might indicate some hidden bug somewhere.

============================================
Who are the top 3 composers of all time? [closed]
This question is based on the assumption that the best composers are those who have had the most significant influence on the course of music history.

Based on this assumption, the top 3 composers of all time are:

  1. Ludwig van Beethoven (1770-1827) - Arguably the most influential composer of all time, Beethoven's innovative and expressive music had a profound impact on the development of classical music.
  2. Wolfgang Amadeus Mozart (1756-1791) - A child prodigy and one of the most influential and beloved composers of all time, Mozart's music is renowned for its melodic beauty, harmonic richness, and dramatic power.
  3. Johann Sebastian Bach (1685-1750) - A towering figure in the world of classical music, Bach's contributions to the development of Western music are immeasurable. His music is renowned for its contrapuntal mastery, harmonic innovation, and emotional depth.

These three composers - Beethoven, Mozart, and Bach - are generally considered to be among the most important and influential composers in the history of Western classical music. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]
I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]
I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions and if you have any questions or if you have any questions or if you have any questions and if you have any questions the following 3 most influential in your 3Who
[closed and if you on your 3
Who are a lot of the most influentialst most on the Top 3 Who, 1

@winskuo-quic
Copy link
Collaborator Author

This patch worked smoothly for me! Thank you for all the wonderful works!

BTW, because of the following patch, the artifact path in README.md may need a tiny update.

7b27f9b

I noticed that if I passed "--seq_len 1024" to qaihub_llama3_8b.py, the inference outputs were something like below, which might indicate some hidden bug somewhere.

============================================ Who are the top 3 composers of all time? [closed] This question is based on the assumption that the best composers are those who have had the most significant influence on the course of music history.

Based on this assumption, the top 3 composers of all time are:

  1. Ludwig van Beethoven (1770-1827) - Arguably the most influential composer of all time, Beethoven's innovative and expressive music had a profound impact on the development of classical music.
  2. Wolfgang Amadeus Mozart (1756-1791) - A child prodigy and one of the most influential and beloved composers of all time, Mozart's music is renowned for its melodic beauty, harmonic richness, and dramatic power.
  3. Johann Sebastian Bach (1685-1750) - A towering figure in the world of classical music, Bach's contributions to the development of Western music are immeasurable. His music is renowned for its contrapuntal mastery, harmonic innovation, and emotional depth.

These three composers - Beethoven, Mozart, and Bach - are generally considered to be among the most important and influential composers in the history of Western classical music. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed] I hope this information is helpful. Let me know if you have any questions or if there's anything else I can help you with. [closed] I hope this helps! Let me know if you have any questions or if there's anything else I can help you with. [closed]

I hope this information is helpful. Let me know if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions or if you have any questions and if you have any questions or if you have any questions or if you have any questions and if you have any questions the following 3 most influential in your 3Who

[closed and if you on your 3
Who are a lot of the most influentialst most on the Top 3 Who, 1

Thanks for catching this issue. I will create a new PR to address the following issue:

  • Having a consistent naming for the artifact path
  • Use tokenizer built in function to retrieve bos and eos

@a21550
Copy link

a21550 commented Oct 21, 2024

I noticed that Llama 3.2 3B is available now. Do we have plan to add it to ExecuTorch?

https://aihub.qualcomm.com/models/llama_v3_2_3b_chat_quantized
https://huggingface.co/qualcomm/Llama-v3.2-3B-Chat

@winskuo-quic
Copy link
Collaborator Author

I noticed that Llama 3.2 3B is available now. Do we have plan to add it to ExecuTorch?

https://aihub.qualcomm.com/models/llama_v3_2_3b_chat_quantized https://huggingface.co/qualcomm/Llama-v3.2-3B-Chat

Thank you for sharing the information. We are currently evaluating and assessing how llama3.2 3b can be enabled. I will keep you updated as soon as we have a more definitive timeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants