Enable GPTQ in executorch #2425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

jerryzh168 wants to merge 1 commit into main from gptq

Contributor

jerryzh168 commented Mar 14, 2024 •

edited

Loading

Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make sure we can produce a model using gptq from torchao repo

Test Plan:
python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X -d fp32

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot bot commented Mar 14, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2425

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7503eeb with merge base 39c93aa ():

NEW FAILURE - The following job has failed:

pull / unittest / macos (buck2) / macos-job (gh)
RuntimeError: Expected to find "aten.convolution.default,aten.sub.Tensor,aten.convolution.default,aten.sub.Tensor," but did not find it

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

jerryzh168 force-pushed the gptq branch from 2ff6842 to 904388f Compare

March 14, 2024 18:52

manuelcandales reviewed

View reviewed changes

examples/models/llama2/quantize.py Outdated

		return torch.empty_like(input, dtype=dtype)


		def group_quantize_tensor_symmetric(

Contributor

manuelcandales Mar 14, 2024

deleted here, but still used by prepare_int4_weight_and_scales_and_zeros

Contributor Author

jerryzh168 Mar 14, 2024

this is probably not used, we'll cleanup a bit later

examples/models/llama2/quantize.py Outdated

		return torch.stack([first_elements, second_elements], dim=-1).view(up_size(shape))


		def per_token_dynamic_quant(input: torch.Tensor) -> torch.Tensor:

Contributor

manuelcandales Mar 14, 2024

deleted here, but still used by linear_forward_8da4w

Contributor Author

jerryzh168 Mar 14, 2024

we'll import from torchao

examples/models/llama2/quantize.py Outdated

		return torch.empty_like(input, dtype=output_dtype)


		def get_group_qparams_symmetric(w, n_bit=4, groupsize=128, precision=torch.float32):

Contributor

manuelcandales Mar 14, 2024

deleted here, but still used by Int8DynActInt4WeightGPTQQuantHandler

examples/models/llama2/quantize.py Outdated

		)


		def pack_scales_and_zeros(scales, zeros, precision=torch.float16):

Contributor

manuelcandales Mar 14, 2024

deleted here, but still used by Int8DynActInt4WeightGPTQQuantHandler

Contributor Author

jerryzh168 Mar 14, 2024

this is fine, we are not using this QuantHandler, I'll remove later as well

jerryzh168 force-pushed the gptq branch from 904388f to f4011e6 Compare

March 14, 2024 22:00

Contributor

facebook-github-bot commented Mar 14, 2024

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 force-pushed the gptq branch 2 times, most recently from 9b6c568 to 9e18ed0 Compare

March 14, 2024 22:25

Contributor

facebook-github-bot commented Mar 14, 2024

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 force-pushed the gptq branch from 9e18ed0 to 5a5935e Compare

March 14, 2024 22:42

manuelcandales approved these changes

View reviewed changes

digantdesai approved these changes

View reviewed changes

Contributor

facebook-github-bot commented Mar 15, 2024

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 force-pushed the gptq branch from 5a5935e to a6e51cf Compare

March 15, 2024 23:57

Contributor

facebook-github-bot commented Mar 15, 2024

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Jack-Khuu pushed a commit to Jack-Khuu/executorch-1 that referenced this pull request


          Enable GPTQ in executorch (pytorch#2425)

8ef7cc1

Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make sure we can produce a model using gptq from torchao repo

Pull Request resolved: pytorch#2425

Test Plan: python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X -d fp32

Reviewed By: manuelcandales

Differential Revision: D54922375

Pulled By: jerryzh168

Jack-Khuu pushed a commit to Jack-Khuu/executorch-1 that referenced this pull request


          Enable GPTQ in executorch (pytorch#2425)

9abf607

Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make sure we can produce a model using gptq from torchao repo

Pull Request resolved: pytorch#2425

Test Plan: python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X -d fp32

Reviewed By: manuelcandales

Differential Revision: D54922375

Pulled By: jerryzh168


          Enable GPTQ in executorch

7503eeb

Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make
sure we can produce a model using gptq from torchao repo

Currently blocked on xnnpack lowering

Test Plan:
python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X

Reviewers:

Subscribers:

Tasks:

Tags:

jerryzh168 force-pushed the gptq branch from a6e51cf to 7503eeb Compare

March 16, 2024 04:15

Contributor

facebook-github-bot commented Mar 16, 2024

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 commented

View reviewed changes

examples/models/llama2/llama_transformer.py

-                          assert (
-                              start_pos is None and cache_k is None and cache_v is None
-                          ), "Caches and start_pos are unused when use_kv_cache is False"
+                          # assert (

Contributor Author

jerryzh168 Mar 16, 2024

@kimishpatel is this OK? I need to comment this out to make sure we can run the torch._dyncmo.export in GPTQ.

@HDCharles is going to work on a refactor of GPTQ to remove export and use tensor subclass instead, we can revert this change when that is implemented I think.

facebook-github-bot closed this in

246ed45

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Mar 16, 2024

@jerryzh168 merged this pull request in 246ed45.

guangy10 mentioned this pull request

Disable flaky test TestQuantLoweringCustomBackendPass from pull #2492

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Merged