Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama in kv mode #6779

shewu-quic · 2024-11-12T07:33:25Z

summary:

Add custom_annotate_llama_last_conv_16a8w
Add llama.py and runner for llama3.2 1B/3B
Support model sharding and sha

summary: - Add custom_annotate_llama_last_conv_16a8w - Add llama.py and runner for llama3.2 1B/3B - Support model sharding and sha

pytorch-bot · 2024-11-12T07:33:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6779

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f2a0383 with merge base dc41596 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cccclai

LGTM! Can you share the command line for both AOT and runtime?

facebook-github-bot · 2024-11-12T21:31:14Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shewu-quic · 2024-11-13T02:15:50Z

LGTM! Can you share the command line for both AOT and runtime?

Sure,

End-to-end flow for 16a4w llama 3.2 1B/3B instruct

python examples/qualcomm/oss_scripts/llama3_2/llama.py -a ./llama3_2_1B_512 -b build-android -H ${ADB_HOST} -s ${ADB_SERIAL_NUM} -m "SM8650" --checkpoint ${checkpoint} --params ${params} --tokenizer_model ${tokenizer.model}  --prompt "What is 1+1?" --temperature 0 --seq_len 512  --model_size ${1B/3B} --ptq 16a4w

Compile only for 16a4w llama 3.2 1B/3B instruct

python examples/qualcomm/oss_scripts/llama3_2/llama.py -a ./llama3_2_1B_512 -b build-android -H ${ADB_HOST} -s ${ADB_SERIAL_NUM} -m "SM8650" --checkpoint ${checkpoint} --params ${params} --tokenizer_model ${tokenizer.model}  --prompt "What is 1+1?" --temperature 0 --seq_len 512 --model_size ${1B/3B}--compile_only --ptq 16a4w

Inference only for 16a4w llama 3.2 1B/3B instruct

python examples/qualcomm/oss_scripts/llama3_2/llama.py -a ./llama3_2_1B_512 -b build-android -H ${ADB_HOST} -s ${ADB_SERIAL_NUM} -m "SM8650" --checkpoint ${checkpoint} --params ${params} --tokenizer_model ${tokenizer.model}  --prompt "What is 1+1?" --temperature 0 --seq_len 512 --pre_gen_pte ./llama3_2_1B_512 --model_size ${1B/3B} --ptq 16a4w

Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama

f2a0383

summary: - Add custom_annotate_llama_last_conv_16a8w - Add llama.py and runner for llama3.2 1B/3B - Support model sharding and sha

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2024

shewu-quic changed the title ~~Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama~~ Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama in kv mode Nov 12, 2024

cccclai approved these changes Nov 12, 2024

View reviewed changes

facebook-github-bot merged commit 31dbfc9 into pytorch:main Nov 13, 2024
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama in kv mode #6779

Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama in kv mode #6779

Uh oh!

shewu-quic commented Nov 12, 2024

Uh oh!

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

Uh oh!

cccclai left a comment

Uh oh!

facebook-github-bot commented Nov 12, 2024

Uh oh!

shewu-quic commented Nov 13, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama in kv mode #6779

Qualcomm AI Engine Direct - support llama3.2 1B/3B with static llama in kv mode #6779

Uh oh!

Conversation

shewu-quic commented Nov 12, 2024

Uh oh!

pytorch-bot bot commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6779

✅ No Failures

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 12, 2024

Uh oh!

shewu-quic commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

shewu-quic commented Nov 13, 2024 •

edited

Loading