Skip to content

Enable GPTQ in executorch #2425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Enable GPTQ in executorch #2425

wants to merge 1 commit into from

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Mar 14, 2024

Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make sure we can produce a model using gptq from torchao repo

Test Plan:
python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X -d fp32

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Mar 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2425

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7503eeb with merge base 39c93aa (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 14, 2024
return torch.empty_like(input, dtype=dtype)


def group_quantize_tensor_symmetric(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted here, but still used by prepare_int4_weight_and_scales_and_zeros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably not used, we'll cleanup a bit later

return torch.stack([first_elements, second_elements], dim=-1).view(up_size(shape))


def per_token_dynamic_quant(input: torch.Tensor) -> torch.Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted here, but still used by linear_forward_8da4w

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll import from torchao

return torch.empty_like(input, dtype=output_dtype)


def get_group_qparams_symmetric(w, n_bit=4, groupsize=128, precision=torch.float32):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted here, but still used by Int8DynActInt4WeightGPTQQuantHandler

)


def pack_scales_and_zeros(scales, zeros, precision=torch.float16):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted here, but still used by Int8DynActInt4WeightGPTQQuantHandler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, we are not using this QuantHandler, I'll remove later as well

@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 force-pushed the gptq branch 2 times, most recently from 9b6c568 to 9e18ed0 Compare March 14, 2024 22:25
@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Jack-Khuu pushed a commit to Jack-Khuu/executorch-1 that referenced this pull request Mar 16, 2024
Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make sure we can produce a model using gptq from torchao repo

Pull Request resolved: pytorch#2425

Test Plan: python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X -d fp32

Reviewed By: manuelcandales

Differential Revision: D54922375

Pulled By: jerryzh168
Jack-Khuu pushed a commit to Jack-Khuu/executorch-1 that referenced this pull request Mar 16, 2024
Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make sure we can produce a model using gptq from torchao repo

Pull Request resolved: pytorch#2425

Test Plan: python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X -d fp32

Reviewed By: manuelcandales

Differential Revision: D54922375

Pulled By: jerryzh168
Summary:
Previously we just added the code but didn't test it, this PR also tests gptq locally to make
sure we can produce a model using gptq from torchao repo

Currently blocked on xnnpack lowering

Test Plan:
python3 -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -qmode 8da4w-gptq -X

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

assert (
start_pos is None and cache_k is None and cache_v is None
), "Caches and start_pos are unused when use_kv_cache is False"
# assert (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel is this OK? I need to comment this out to make sure we can run the torch._dyncmo.export in GPTQ.

@HDCharles is going to work on a refactor of GPTQ to remove export and use tensor subclass instead, we can revert this change when that is implemented I think.

@facebook-github-bot
Copy link
Contributor

@jerryzh168 merged this pull request in 246ed45.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants