Add torchao mps lowbit ops to llama runner #7037

manuelcandales · 2024-11-22T19:55:54Z

Setup ET: https://pytorch.org/executorch/stable/getting-started-setup

Install llama runner requirements

bash examples/models/llama/install_requirements.sh

Build ET with MPS ON:

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_MPS=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out .
cmake --build cmake-out -j16 --target install --config Release

Build llama runner with torchao mps ops

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_BUILD_TORCHAO=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -Bcmake-out/examples/models/llama \
    examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --target install --config Release

Export model. Note: qmode can be any of torchao:fpa1w ... torchao:fpa7w, and group_size can be any of: 32, 64, 128, 256

CMAKE_INSTALL_PREFIX=$PWD/cmake-out python -m examples.models.llama.export_llama \
--checkpoint /path/to/model.pth \
--params  /path/to/params.json \
-kv --use_sdpa_with_kv_cache --disable_dynamic_shape --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
-qmode "torchao:fpa4w" --group_size 32 \
--output_name path/to/output.pte

Run:

cmake-out/examples/models/llama/llama_main --model_path=/path/to/output.pte --tokenizer_path=/path/to/tokenizer.model --prompt="Once upon a time,"

pytorch-bot · 2024-11-22T19:55:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7037

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job

As of commit d47141f with merge base daf9aee ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
curl: (22) The requested URL returned error:
pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t bcbce14150776a156ffad265b9508db49218afdbf2eddd961326cf6e81234108 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

pull / test-mediatek-models-linux / linux-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel

Some nits but largely looks good. Please include the output for llama runner in the summary.

Consider fixing the nits. Particularly I dont think we should add TORCHAO_MPS. Just overload TORCHAO one

examples/models/llama/CMakeLists.txt

examples/models/llama/source_transformation/quantize.py

kimishpatel · 2024-12-10T21:33:44Z

examples/models/llama/source_transformation/quantize.py

+        if verbose:
+            print("quantized model:", model)


Please using logging

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2024

manuelcandales added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Nov 22, 2024

manuelcandales force-pushed the torchao-mps-llama-runner branch 5 times, most recently from ad4dbaf to cd9a5fa Compare December 10, 2024 19:09

manuelcandales requested a review from kimishpatel December 10, 2024 19:33

kimishpatel approved these changes Dec 10, 2024

View reviewed changes

manuelcandales force-pushed the torchao-mps-llama-runner branch 2 times, most recently from b8f9360 to b7f8f84 Compare December 12, 2024 03:32

Add torchao mps lowbit ops to llama runner

8a10740

manuelcandales force-pushed the torchao-mps-llama-runner branch from b7f8f84 to 8a10740 Compare December 12, 2024 04:18

Update ao submodule

d47141f

manuelcandales merged commit df0b06c into pytorch:main Dec 13, 2024
41 of 44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add torchao mps lowbit ops to llama runner #7037

Add torchao mps lowbit ops to llama runner #7037

Uh oh!

manuelcandales commented Nov 22, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading

Uh oh!

kimishpatel left a comment

Uh oh!

Uh oh!

Uh oh!

kimishpatel Dec 10, 2024

Uh oh!

Uh oh!

Uh oh!

Add torchao mps lowbit ops to llama runner #7037

Add torchao mps lowbit ops to llama runner #7037

Uh oh!

Conversation

manuelcandales commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7037

❌ 2 New Failures, 1 Cancelled Job

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kimishpatel Dec 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

manuelcandales commented Nov 22, 2024 •

edited

Loading

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading