-
Notifications
You must be signed in to change notification settings - Fork 608
[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4627
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit a1e3286 with merge base 192d463 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hey sorry I totally miss the pr. May I know more context here? Is it for migrating from |
I think the replacement is
|
Sorry for your inconvenient. Because we found that we get the similar results for different prompt in quantized llama. |
Thanks for your information. |
Ah yes, |
trying to follow,
Does |
Got it. I will fix it and check again the result. |
Yes, I exactly find this issue with |
It seems default value is True in torch.export.export. Ooooh, it seems we call wrong API before quantization. |
oh |
cc @jerryzh168 for convert_pt2e stuff |
Hi @cccclai,
We found that the graph with mutable buffer after export API in quantization flow is not expected.
I expect that the mutable buffer is I/O, not a constant.
And we can find that the following message is in the FP flow, but not in the quantized flow.
The following results could be reproduced to generate graph by this PR.
In summary, there are two questions about the graph for the quantization flow in the export stage after convert_pt2e
Do you know what might be wrong?
Floating Point Flow
This is exactly what I expected. At runtime, k_cache will become the input and the result of index_put will be output.

torch._export.capture_pre_autograd_graph
in Quantization FlowThere are two problems here.
convert_pt2e
to frozem_paramAs far as I know,

torch._export.capture_pre_autograd_graph
will be replaced bytorch.export
, right? But when I change to torch.export, the problem still exists.Replaced by
torch.export
in Quantization FlowAfter
torch.export
, it will insert a copy op for BUFFER_MUTATION in graph signature. Therefore, k_cache will not be a dead code afterconvert_pt2e
but k_cache is not a input of index_put.Replaced by
torch.export
andconvert_pt2e(m, fold_quantize=False)
in Quantization FlowI think this graph is my expected, but we need to change some codes in our passes to get the correct qaunt_attr for each op.

Reproduce Command