Skip to content

update torchao pin: optimized mps lowbit shaders #1428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 18, 2024
Merged

Conversation

manuelcandales
Copy link
Contributor

@manuelcandales manuelcandales commented Dec 18, 2024

Updates torchao pin to benefit from optimizations to the MPS experimental lowbit kernels AO PR #1422

Llama 3.2 1B (llama3.2-1b-base):
1-bit: 28.0688
2-bit: 31.2422
3-bit: 30.1294
4-bit: 30.7905
5-bit: 28.1504
6-bit: 28.4321
7-bit: 27.3991

Llama 3.1 8B (llama3.1-base):
1-bit: 7.4459
2-bit: 15.6508
3-bit: 15.3086
4-bit: 16.1268
5-bit: 6.7308
6-bit: 6.4887
7-bit: 6.4537

Copy link

pytorch-bot bot commented Dec 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1428

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Pending, 2 Unrelated Failures

As of commit 3c0e898 with merge base 56be609 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

  • pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job (gh) (trunk failure)
    RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107
  • pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job (gh) (trunk failure)
    RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 18, 2024
@manuelcandales manuelcandales changed the title update torchao pin: optimized mps experimental shaders update torchao pin: optimized mps lowbit shaders Dec 18, 2024
@Jack-Khuu Jack-Khuu added the Quantization Issues related to Quantization or torchao label Dec 18, 2024
Copy link
Contributor

@Jack-Khuu Jack-Khuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, if no objections from quant folk

@manuelcandales manuelcandales merged commit 113e40b into main Dec 18, 2024
49 of 53 checks passed
vmpuri pushed a commit that referenced this pull request Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. Quantization Issues related to Quantization or torchao
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants