[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627

shewu-quic · 2024-08-09T09:44:36Z

We found that the graph with mutable buffer after export API in quantization flow is not expected.
I expect that the mutable buffer is I/O, not a constant.
And we can find that the following message is in the FP flow, but not in the quantized flow.

[utils.py:362] The buffer node is a mutated buffer node, which is not constant.

The following results could be reproduced to generate graph by this PR.
In summary, there are two questions about the graph for the quantization flow in the export stage after convert_pt2e

past_k_cache is a frozen constant, not an input.
index_put is not the output of graph.

Do you know what might be wrong?

Floating Point Flow

This is exactly what I expected. At runtime, k_cache will become the input and the result of index_put will be output.

`torch._export.capture_pre_autograd_graph` in Quantization Flow

There are two problems here.

b_k_cache is folded by convert_pt2e to frozem_param
index_put is not output of the graph.

As far as I know, torch._export.capture_pre_autograd_graph will be replaced by torch.export, right? But when I change to torch.export, the problem still exists.

Replaced by `torch.export` in Quantization Flow

After torch.export, it will insert a copy op for BUFFER_MUTATION in graph signature. Therefore, k_cache will not be a dead code after convert_pt2e but k_cache is not a input of index_put.

Replaced by `torch.export` and `convert_pt2e(m, fold_quantize=False)` in Quantization Flow

I think this graph is my expected, but we need to change some codes in our passes to get the correct qaunt_attr for each op.

Reproduce Command

python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_index_put -b build_android -s {device_id} -m SM8650  -a unit_test

pytorch-bot · 2024-08-09T09:44:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4627

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit a1e3286 with merge base 192d463 ():

NEW FAILURE - The following job has failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for backends/qualcomm/tests/utils.py:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cccclai · 2024-08-12T01:47:12Z

Hey sorry I totally miss the pr. May I know more context here? Is it for migrating from m = torch._export.capture_pre_autograd_graph to m = torch.export.export(module, inputs).module()?

cccclai · 2024-08-12T01:50:10Z

I think the replacement is

        captured_graph = export_trace._export(
            model, inputs, pre_dispatch=True
        ).module()

shewu-quic · 2024-08-12T02:02:01Z

Hey sorry I totally miss the pr. May I know more context here? Is it for migrating from m = torch._export.capture_pre_autograd_graph to m = torch.export.export(module, inputs).module()?

Sorry for your inconvenient. Because we found that we get the similar results for different prompt in quantized llama.
And I investigate it seems about mutable buffer not working in the quantization flow.
So I create this PR to describe the issue which occurs in torch.export.export and torch._export.capture_pre_autograd_graph
I think that it seems about export issue for quantized mutable buffer.

shewu-quic · 2024-08-12T02:09:33Z

I think the replacement is

        captured_graph = export_trace._export(
            model, inputs, pre_dispatch=True
        ).module()

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

cccclai · 2024-08-12T03:04:06Z

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

Ah yes, pre_dispath=True is needed here

cccclai · 2024-08-12T03:05:40Z

trying to follow,

So I create this PR to describe the issue which occurs in torch.export.export and torch._export.capture_pre_autograd_graph

Does torch._export.capture_pre_autograd_graph have similar issue? Sounds like it does?

shewu-quic · 2024-08-12T03:06:26Z

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

Ah yes, pre_dispath=True is needed here

Got it. I will fix it and check again the result.

shewu-quic · 2024-08-12T03:07:22Z

trying to follow,

So I create this PR to describe the issue which occurs in torch.export.export and torch._export.capture_pre_autograd_graph

Does torch._export.capture_pre_autograd_graph have similar issue? Sounds like it does?

Yes, I exactly find this issue with torch._export.capture_pre_autograd_graph in llama.

shewu-quic · 2024-08-12T03:19:54Z

Thanks for your information.
So we seems to miss pre_dispatch=True in our unit_test?

Ah yes, pre_dispath=True is needed here

It seems default value is True in torch.export.export.

Ooooh, it seems we call wrong API before quantization.
We should call export_trace._export not "torch.export.export", is it correct?
Because we follow the warning message before to change API

cccclai · 2024-08-12T03:29:41Z

It seems default value is True in torch.export.export.

Ooooh, it seems we call wrong API before quantization. We should call export_trace._export not "torch.export.export", is it correct?

oh torch.export.export is the correct one, export_trace was just a rename after import the export api...

kimishpatel · 2024-08-30T14:37:56Z

cc @jerryzh168 for convert_pt2e stuff

Unexpected graph for mutable buffer in Quantization

a1e3286

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2024

cccclai self-assigned this Aug 24, 2024

cccclai mentioned this pull request Aug 29, 2024

Allow qnn to use the IR from torch.export.export #4942

Merged

cccclai mentioned this pull request Oct 15, 2024

Qualcomm AI Engine Direct - Observer Fix and remove unused passes #6225

Closed

cccclai mentioned this pull request Oct 29, 2024

Qualcomm AI Engine Direct - The performance issue about mutable buffer #6493

Closed

shewu-quic closed this Feb 7, 2025

shewu-quic mentioned this pull request Jun 3, 2025

[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer after export during Quantization #11309

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627

[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627

Uh oh!

shewu-quic commented Aug 9, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024 •

edited

Loading

Uh oh!

shewu-quic commented Aug 12, 2024

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024 •

edited

Loading

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

kimishpatel commented Aug 30, 2024

Uh oh!

Uh oh!

[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627

[Draft] Qualcomm AI Engine Direct - Unexpected graph for mutable buffer in Quantization #4627

Uh oh!

Conversation

shewu-quic commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Floating Point Flow

torch._export.capture_pre_autograd_graph in Quantization Flow

Replaced by torch.export in Quantization Flow

Replaced by torch.export and convert_pt2e(m, fold_quantize=False​) in Quantization Flow

Reproduce Command

Uh oh!

pytorch-bot bot commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4627

❌ 1 New Failure

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shewu-quic commented Aug 12, 2024

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024

Uh oh!

shewu-quic commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai commented Aug 12, 2024

Uh oh!

kimishpatel commented Aug 30, 2024

Uh oh!

Uh oh!

shewu-quic commented Aug 9, 2024 •

edited

Loading

`torch._export.capture_pre_autograd_graph` in Quantization Flow

Replaced by `torch.export` in Quantization Flow

Replaced by `torch.export` and `convert_pt2e(m, fold_quantize=False)` in Quantization Flow

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

shewu-quic commented Aug 12, 2024 •

edited

Loading

shewu-quic commented Aug 12, 2024 •

edited

Loading