Skip to content

Add torchao mps lowbit ops to llama runner #7037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

manuelcandales
Copy link
Contributor

@manuelcandales manuelcandales commented Nov 22, 2024

Setup ET: https://pytorch.org/executorch/stable/getting-started-setup

Install llama runner requirements

bash examples/models/llama/install_requirements.sh

Build ET with MPS ON:

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_MPS=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out .
cmake --build cmake-out -j16 --target install --config Release

Build llama runner with torchao mps ops

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_BUILD_TORCHAO=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -Bcmake-out/examples/models/llama \
    examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --target install --config Release

Export model. Note: qmode can be any of torchao:fpa1w ... torchao:fpa7w, and group_size can be any of: 32, 64, 128, 256

CMAKE_INSTALL_PREFIX=$PWD/cmake-out python -m examples.models.llama.export_llama \
--checkpoint /path/to/model.pth \
--params  /path/to/params.json \
-kv --use_sdpa_with_kv_cache --disable_dynamic_shape --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
-qmode "torchao:fpa4w" --group_size 32 \
--output_name path/to/output.pte

Run:

cmake-out/examples/models/llama/llama_main --model_path=/path/to/output.pte --tokenizer_path=/path/to/tokenizer.model --prompt="Once upon a time,"

Copy link

pytorch-bot bot commented Nov 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7037

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job

As of commit d47141f with merge base daf9aee (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2024
@manuelcandales manuelcandales added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Nov 22, 2024
@manuelcandales manuelcandales force-pushed the torchao-mps-llama-runner branch 5 times, most recently from ad4dbaf to cd9a5fa Compare December 10, 2024 19:09
Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits but largely looks good. Please include the output for llama runner in the summary.

Consider fixing the nits. Particularly I dont think we should add TORCHAO_MPS. Just overload TORCHAO one

Comment on lines +97 to +96
if verbose:
print("quantized model:", model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please using logging

@manuelcandales manuelcandales force-pushed the torchao-mps-llama-runner branch 2 times, most recently from b8f9360 to b7f8f84 Compare December 12, 2024 03:32
@manuelcandales manuelcandales force-pushed the torchao-mps-llama-runner branch from b7f8f84 to 8a10740 Compare December 12, 2024 04:18
@manuelcandales manuelcandales merged commit df0b06c into pytorch:main Dec 13, 2024
41 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants