Skip to content

[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

shewu-quic
Copy link
Collaborator

@shewu-quic shewu-quic commented Aug 9, 2024

Hi @cccclai,

We found that the graph with mutable buffer after export API in quantization flow is not expected.
I expect that the mutable buffer is I/O, not a constant.
And we can find that the following message is in the FP flow, but not in the quantized flow.

[utils.py:362] The buffer node is a mutated buffer node, which is not constant.

The following results could be reproduced to generate graph by this PR.
In summary, there are two questions about the graph for the quantization flow in the export stage after convert_pt2e

  1. past_k_cache is a frozen constant, not an input.
  2. index_put is not the output of graph.

Do you know what might be wrong?

Floating Point Flow

This is exactly what I expected. At runtime, k_cache will become the input and the result of index_put will be output.
image

torch._export.capture_pre_autograd_graph in Quantization Flow

There are two problems here.

  1. b_k_cache is folded by convert_pt2e to frozem_param
  2. index_put is not output of the graph.

As far as I know, torch._export.capture_pre_autograd_graph will be replaced by torch.export, right? But when I change to torch.export, the problem still exists.
image

Replaced by torch.export in Quantization Flow

After torch.export, it will insert a copy op for BUFFER_MUTATION in graph signature. Therefore, k_cache will not be a dead code after convert_pt2e but k_cache is not a input of index_put.

image

Replaced by torch.export and convert_pt2e(m, fold_quantize=False​) in Quantization Flow

I think this graph is my expected, but we need to change some codes in our passes to get the correct qaunt_attr for each op.
image

Reproduce Command

python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_index_put -b build_android -s {device_id} -m SM8650  -a unit_test 

Copy link

pytorch-bot bot commented Aug 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4627

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit a1e3286 with merge base 192d463 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2024
@cccclai
Copy link
Contributor

cccclai commented Aug 12, 2024

Hey sorry I totally miss the pr. May I know more context here? Is it for migrating from m = torch._export.capture_pre_autograd_graph to m = torch.export.export(module, inputs).module()?

@cccclai
Copy link
Contributor

cccclai commented Aug 12, 2024

I think the replacement is

        captured_graph = export_trace._export(
            model, inputs, pre_dispatch=True
        ).module()

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Aug 12, 2024

Hey sorry I totally miss the pr. May I know more context here? Is it for migrating from m = torch._export.capture_pre_autograd_graph to m = torch.export.export(module, inputs).module()?

Sorry for your inconvenient. Because we found that we get the similar results for different prompt in quantized llama.
And I investigate it seems about mutable buffer not working in the quantization flow.
So I create this PR to describe the issue which occurs in torch.export.export and torch._export.capture_pre_autograd_graph
I think that it seems about export issue for quantized mutable buffer.

@shewu-quic
Copy link
Collaborator Author

I think the replacement is

        captured_graph = export_trace._export(
            model, inputs, pre_dispatch=True
        ).module()

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

@cccclai
Copy link
Contributor

cccclai commented Aug 12, 2024

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

Ah yes, pre_dispath=True is needed here

@cccclai
Copy link
Contributor

cccclai commented Aug 12, 2024

trying to follow,

So I create this PR to describe the issue which occurs in torch.export.export and torch._export.capture_pre_autograd_graph

Does torch._export.capture_pre_autograd_graph have similar issue? Sounds like it does?

@shewu-quic
Copy link
Collaborator Author

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

Ah yes, pre_dispath=True is needed here

Got it. I will fix it and check again the result.

@shewu-quic
Copy link
Collaborator Author

trying to follow,

So I create this PR to describe the issue which occurs in torch.export.export and torch._export.capture_pre_autograd_graph

Does torch._export.capture_pre_autograd_graph have similar issue? Sounds like it does?

Yes, I exactly find this issue with torch._export.capture_pre_autograd_graph in llama.

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Aug 12, 2024

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

Ah yes, pre_dispath=True is needed here

It seems default value is True in torch.export.export.

Ooooh, it seems we call wrong API before quantization.
We should call export_trace._export not "torch.export.export", is it correct?
Because we follow the warning message before to change API
image

@cccclai
Copy link
Contributor

cccclai commented Aug 12, 2024

It seems default value is True in torch.export.export.

Ooooh, it seems we call wrong API before quantization. We should call export_trace._export not "torch.export.export", is it correct?

oh torch.export.export is the correct one, export_trace was just a rename after import the export api...

@kimishpatel
Copy link
Contributor

cc @jerryzh168 for convert_pt2e stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants