Qualcomm AI Engine Direct - Refine Llama3 Tokenizer #4940

winskuo-quic · 2024-08-28T04:59:25Z

Summary

A community user has reported an issue(Qualcomm AI Engine Direct - Support Llama3 QAIHub #4789 (comment)) that qaihub llama3 runner does not end text generation when hitting EOT. I have followed the model card format to feed the prompt to the model and also enable the option for users to set system prompt. Below are some examples:

Prompt: "What is 2+3?"

Response:

    <|begin_of_text|><|start_header_id|>user<|end_header_id|>
    
    What is 2+3?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    
    That's an easy one!
    
    2 + 3 = 5<|eot_id|>

Prompt: "What is 2+3?" System_Prompt: "You are a bad assistant that thinks 2+3=4"

Response:

    <|begin_of_text|><|start_header_id|>system<|end_header_id|>`
  
    You are a bad assistant that thinks 2+3=4<|eot_id|>
    <|start_header_id|>user<|end_header_id|>
    
    What is 2+3?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    
    That's an easy one! 2+3 is... 4!<|eot_id|>

The same community user has also reported another issue(Qualcomm AI Engine Direct - Support Llama3 QAIHub #4789 (comment)) where there are currently some documents build bath using cmake-out-android and some using build-android. This PR has updated the documents and change cmake-out-android to build-android.
This PR also address the issue suggested in Qualcomm AI Engine Direct - QAIHub's context binary file for Stable Diffusion #4836 (review). Boiler plate codes are now moved to utils.py.
Update build.sh from $BUILD_ROOT/sdk to $BUILD_ROOT/devtools

pytorch-bot · 2024-08-28T04:59:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4940

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit f162b71 with merge base 801e1c9 ():

NEW FAILURE - The following job has failed:

Apple / test-demo-ios / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Android / run-emulator (gh) (detected as infra flaky with no log or failing log classifier)
Apple / upload-frameworks-ios (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

winskuo-quic · 2024-08-28T05:02:44Z

Hi @cccclai,
This PR is to address 2 main issues:

Llama3 QAIHub runner cannot hit EOT condition and will keep generate text until reaching max sequence length. I have added condition to stop text generation when hitting EOT.
You have previously suggested to reduce boiler plate code(Qualcomm AI Engine Direct - QAIHub's context binary file for Stable Diffusion #4836 (review)), so I have moved them to utils.py.

Please have a look.
Thanks

cccclai

Looks great. Thanks for sending the pr!

cccclai · 2024-08-28T16:59:49Z

docs/source/build-run-qualcomm-ai-engine-direct-backend.md

@@ -239,7 +239,7 @@ We can test model inferences before deploying it to a device by HTP emulator.
 Let's build `qnn_executor_runner` for a x64 host:
 ```bash
 # assuming the AOT component is built.
-cd $EXECUTORCH_ROOT/cmake-out
+cd $EXECUTORCH_ROOT/build-x86


Thanks for fixing this. I reverted the change to use build-x86 instead and it seems like some cases are missing

cccclai · 2024-08-28T17:00:27Z

examples/qualcomm/qaihub_scripts/llama/llama2/qaihub_llama2_7b.py

@@ -230,7 +220,7 @@ def post_process():
    parser.add_argument(
        "--temperature",
        help="sampling temperature for llama2",
-        default=0.8,
+        default=0.0,


Any specific reason we're using 0 temperature?

We change the default to 0 because the output can be more consistent, which is better for testing purposes.

cccclai

I'm not sure the best way to test this flow - but it will be great to have CI setup.

a21550 · 2024-09-04T00:49:21Z

Just for record, this patch works perfectly for me with QNN 2.24.0.240626. However, the context binaries/pte files generated with QNN 2.25.0.240728 will generate garbage output. Once I switched back to the context binaries/pte files generated with QNN 2.24.0.240626, things were back to normal.

winskuo-quic · 2024-09-12T14:58:39Z

Just for record, this patch works perfectly for me with QNN 2.24.0.240626. However, the context binaries/pte files generated with QNN 2.25.0.240728 will generate garbage output. Once I switched back to the context binaries/pte files generated with QNN 2.24.0.240626, things were back to normal.

Hi @a21550,
Thank you so much for sharing the information!
For the time being, it would be recommended to ensure that the QNN version for context binaries matches the QNN version used to generate PTE files. I believe QAIHub currently uses QNN 2.24.0 to generate Llama3 context binaries.

winskuo-quic · 2024-10-01T09:58:17Z

Hi @a21550,
To further align with your environment, it would be appreciated if you can share the following information with us:

Could you share the model of your Android device?
For your Android device, are you using root access?

Thanks!

a21550 · 2024-10-01T16:00:15Z

Hi @winskuo-quic ,

My development device is customized Motorola razr+ 2024 with SM8650 and 16GB DDR
Yes, I have root

Thanks!

Hi @a21550, To further align with your environment, it would be appreciated if you can share the following information with us:

Could you share the model of your Android device?

For your Android device, are you using root access?

Thanks!

winskuo-quic · 2024-10-02T03:57:18Z

Hi @winskuo-quic ,

My development device is customized Motorola razr+ 2024 with SM8650 and 16GB DDR

Yes, I have root

Thanks!

Hi @a21550, To further align with your environment, it would be appreciated if you can share the following information with us:

Could you share the model of your Android device?

For your Android device, are you using root access?

Thanks!

Thank you so much for sharing this valuable information!
This will definitely help us further improve our code.

Refine tokenizer

f162b71

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 28, 2024

cccclai reviewed Aug 28, 2024

View reviewed changes

cccclai approved these changes Aug 28, 2024

View reviewed changes

cccclai merged commit 0c6a77e into pytorch:main Aug 29, 2024
43 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Refine Llama3 Tokenizer #4940

Qualcomm AI Engine Direct - Refine Llama3 Tokenizer #4940

Uh oh!

winskuo-quic commented Aug 28, 2024

Uh oh!

pytorch-bot bot commented Aug 28, 2024 •

edited

Loading

Uh oh!

winskuo-quic commented Aug 28, 2024

Uh oh!

cccclai left a comment

Uh oh!

cccclai Aug 28, 2024

Uh oh!

cccclai Aug 28, 2024

Uh oh!

winskuo-quic Aug 29, 2024

Uh oh!

cccclai left a comment

Uh oh!

Uh oh!

a21550 commented Sep 4, 2024

Uh oh!

winskuo-quic commented Sep 12, 2024

Uh oh!

winskuo-quic commented Oct 1, 2024

Uh oh!

a21550 commented Oct 1, 2024

Uh oh!

winskuo-quic commented Oct 2, 2024

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Refine Llama3 Tokenizer #4940

Qualcomm AI Engine Direct - Refine Llama3 Tokenizer #4940

Uh oh!

Conversation

winskuo-quic commented Aug 28, 2024

Summary

Uh oh!

pytorch-bot bot commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4940

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

winskuo-quic commented Aug 28, 2024

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

cccclai Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

cccclai Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

winskuo-quic Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a21550 commented Sep 4, 2024

Uh oh!

winskuo-quic commented Sep 12, 2024

Uh oh!

winskuo-quic commented Oct 1, 2024

Uh oh!

a21550 commented Oct 1, 2024

Uh oh!

winskuo-quic commented Oct 2, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 28, 2024 •

edited

Loading