Skip to content

Qualcomm AI Engine Direct - Model sharding for LLM #4923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shewu-quic
Copy link
Collaborator

For LLM, model size is too large to fit in device memory for inference. Therefore, we need to divide the model into a few parts in order to avoid inference time out-of-memory errors.

Summary:

  • Use custom fallback op to split graph
  • Add splill fill feature
  • Add model sharding argument for qnn

Copy link

pytorch-bot bot commented Aug 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4923

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 777fa22 with merge base 3fb03dc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 27, 2024
@shewu-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is to support model sharding in QNN.
If possible, could you please have a look.
Thanks!

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this change! I believe it's the last piece?

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Aug 28, 2024

Thank you for adding this change! I believe it's the last piece?

Thanks for your prompt review.

It should be work for llama model now.
Next, I will create the following PRs as soon as possible, and they will improve accuracy and the functionality of kv cache.

  1. Add a pass to convert linear to conv2d:
    We found the accuracy drop because of QNN Linear op in llama3. And it will be fixed with convert linear to conv2d pass.
  2. Support QNN RMS norm node:
    I think it will be benefit for quantization.
    Note that it will need to use Qnn version 2.25 or above.
  3. Workaround the issue about mutable buffer for index_put op:
    We add a pass to replace the input of index_put op. But it needs to change API to torch.export.export.
    May I know do you have a plan to change it?
class ReplaceIndexPutInput(ExportPass):
    """
    Index put input workaround for quantized module
    """
    dq_q_map = {
        # per tensor
        exir_ops.edge.quantized_decomposed.dequantize_per_tensor.default: exir_ops.edge.quantized_decomposed.quantize_per_tensor.default,
        exir_ops.edge.quantized_decomposed.dequantize_per_tensor.tensor: exir_ops.edge.quantized_decomposed.quantize_per_tensor.tensor,
        # per channel
        exir_ops.edge.quantized_decomposed.dequantize_per_channel.default: exir_ops.edge.quantized_decomposed.quantize_per_channel.default,
    }


    def __init__(self, edge_program: torch.export.ExportedProgram):
        super(ReplaceIndexPutInput, self).__init__()
        self.edge_program = edge_program

    def call(self, graph_module: torch.fx.GraphModule):
        graph = graph_module.graph
        for node in graph.nodes:
            if node.target == exir_ops.edge.aten.index_put.default:
                if (copy_node := list(node.users)[0]) and copy_node.target == exir_ops.edge.aten.copy.default:
                    m_buffer_node = copy_node.args[0]
                    bad_frozen_node = node.args[0]
                    if QCOM_QUANT_ATTRS in bad_frozen_node.meta:
                        m_buffer_node.meta[QCOM_QUANT_ATTRS] = bad_frozen_node.meta[QCOM_QUANT_ATTRS]
                        m_buffer_node.meta[QCOM_QUANT_ATTRS][QCOM_ENCODING] = self.dq_q_map[m_buffer_node.meta[QCOM_QUANT_ATTRS][QCOM_ENCODING]]
                    with graph.inserting_after(bad_frozen_node):
                        node.replace_input_with(bad_frozen_node, m_buffer_node)
                else:
                    continue

        graph.eliminate_dead_code()
        graph_module.recompile()
        return PassResult(graph_module, True)
  1. Add a pass about custom annotation for 16a8w last linear
  2. Change dtype of pos input from int64 to int32
    We found that the scatterND (index_put) needs to use int 32 index in QNN 2.25.
  3. spin quant related pass

@shewu-quic shewu-quic force-pushed the dev1/hutton/model_sharding_with_custom_op branch 2 times, most recently from 19adc79 to 20d1e12 Compare August 28, 2024 02:46
@WuhanMonkey
Copy link
Contributor

Hello team, I was trying to quantize the Llama 3.1 8B model against the QNN backend. After pulling in this PR, I had this issue below:

(et_qnn) bash-5.1$ python -m examples.models.llama2.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" -p "${MODEL_DIR}/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w -d fp32 --num_sharding 4 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="${QNN_MODEL_PTE}"
[INFO 2024-08-27 19:25:28,948 export_llama_lib.py:450] Applying quantizers: [<executorch.backends.qualcomm.quantizer.quantizer.QnnQuantizer object at 0x7fa97733a920>]
[INFO 2024-08-27 19:25:28,948 export_llama_lib.py:645] Loading model with checkpoint=/home/chengpenghu/chester/Meta-Llama-3.1-8B-Instruct-Original/consolidated.00.pth, params=/home/chengpenghu/chester/Meta-Llama-3.1-8B-Instruct-Original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
/home/chengpenghu/chester/executorch/examples/models/llama2/model.py:100: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
[INFO 2024-08-27 19:25:29,388 export_llama_lib.py:668] Loaded model with dtype=torch.bfloat16
[INFO 2024-08-27 19:25:29,389 builder.py:111] model.to torch.float32
W0827 19:25:53.081331 3501918 torch/_export/__init__.py:65] +============================+
W0827 19:25:53.081530 3501918 torch/_export/__init__.py:66] |     !!!   WARNING   !!!    |
W0827 19:25:53.081594 3501918 torch/_export/__init__.py:67] +============================+
W0827 19:25:53.081645 3501918 torch/_export/__init__.py:68] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward.
[INFO 2024-08-27 19:26:04,614 builder.py:179] Using pt2e [<executorch.backends.qualcomm.quantizer.quantizer.QnnQuantizer object at 0x7fa97733a920>] to quantizing the model...
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, <built-in function getitem>
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.add_.Tensor
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
No quant config is implemented for op, aten.to.dtype
No quant config is implemented for op, aten.type_as.default
Traceback (most recent call last):
  File "/home/chengpenghu/local/anaconda3/envs/et_qnn/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/chengpenghu/local/anaconda3/envs/et_qnn/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/chengpenghu/chester/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/home/chengpenghu/chester/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/home/chengpenghu/chester/executorch/examples/models/llama2/export_llama_lib.py", line 352, in export_llama
    builder = _export_llama(modelname, args)
  File "/home/chengpenghu/chester/executorch/examples/models/llama2/export_llama_lib.py", line 529, in _export_llama
    builder_exported_to_edge.metadata["get_n_layers"],
KeyError: 'get_n_layers'

@shewu-quic
Copy link
Collaborator Author

shewu-quic commented Aug 28, 2024

Hello team, I was trying to quantize the Llama 3.1 8B model against the QNN backend. After pulling in this PR, I had this issue below:

Oops, Sorry about that "get_n_layers" seems to be removed from meta data. Let me fix it.

For LLM, model size is too large to fit in device memory for inference.
Therefore, we need to divide the model into a few parts in order to
avoid inference time out-of-memory errors.

Summary:
- Use custom fallback op to split graph
- Add splill fill feature
- Add model sharding argument for qnn
@shewu-quic shewu-quic force-pushed the dev1/hutton/model_sharding_with_custom_op branch from 20d1e12 to 777fa22 Compare August 28, 2024 04:17
@cccclai
Copy link
Contributor

cccclai commented Aug 28, 2024

But it needs to change API to torch.export.export.
May I know do you have a plan to change it?

I can help with that. Sorry I don’t have access to my regular machines recently and had a difficult time to repro the qnn flow on Ubuntu Linux (wsl) system. But this change should be simple enough and doesn’t require setting up qnn dependencies

@cccclai cccclai merged commit 4116cb2 into pytorch:main Aug 28, 2024
36 checks passed
@cccclai
Copy link
Contributor

cccclai commented Aug 28, 2024

Just want to confirm on this

But it needs to change API to torch.export.export.

Any specific reason it's needed? Is it because the graph is better?

@shewu-quic
Copy link
Collaborator Author

Just want to confirm on this

But it needs to change API to torch.export.export.

Any specific reason it's needed? Is it because the graph is better?

The past kv cache is eliminated (frozen) with capture_pre_autograd_graph after convet_pt2e(fold_quantize=True).
Because it will insert copy op for past kv cache and the output of index_put op with torch.export.export.
Therefore, after convert_pt2e(fold_quantize=True), there is a user for past_kv_cache.
Our workaround is to replace frozen param with past kv cache.

@cccclai
Copy link
Contributor

cccclai commented Aug 29, 2024

Just want to confirm on this

But it needs to change API to torch.export.export.

Any specific reason it's needed? Is it because the graph is better?

The past kv cache is eliminated (frozen) with capture_pre_autograd_graph after convet_pt2e(fold_quantize=True). Because it will insert copy op for past kv cache and the output of index_put op with torch.export.export. Therefore, after convert_pt2e(fold_quantize=True), there is a user for past_kv_cache. Our workaround is to replace frozen param with past kv cache.

Did you expect changes like this #4942? Also wanted to check, if we replace torch.export.export with export_for_training , does the graph look right to you?

@shewu-quic
Copy link
Collaborator Author

Just want to confirm on this

But it needs to change API to torch.export.export.

Any specific reason it's needed? Is it because the graph is better?

The past kv cache is eliminated (frozen) with capture_pre_autograd_graph after convet_pt2e(fold_quantize=True). Because it will insert copy op for past kv cache and the output of index_put op with torch.export.export. Therefore, after convert_pt2e(fold_quantize=True), there is a user for past_kv_cache. Our workaround is to replace frozen param with past kv cache.

Did you expect changes like this #4942? Also wanted to check, if we replace torch.export.export with export_for_training , does the graph look right to you?

Yes, it is similar to our change, but we don't set "strict". Does it affect anything?
I haven't tried it maybe I could check it

@WuhanMonkey
Copy link
Contributor

Hi @cccclai, @shewu-quic, I am trying to load the llama 3.1 8B model sharded in 4 to the QNN example. But the loading failed with memory buffer issues. The same workflow works for llama 2 7B model without using any sharding. I wonder if it is related to this diff and there are things needed to update for the model loading on-device?

This is on OnePlus 12 with 16GB RAM, so memory size shouldn't be the issue.

Here is the full log

2024-08-29 22:00:13.351 13269-13526 ETLogging               com.example.executorchllamademo      D  Loading model /data/local/tmp/llama/et_exported_llama3.1_8b_qnn_a16w4_shard4.pte with tokenizer /data/local/tmp/llama/llama3_tiktokenizer.bin
2024-08-29 22:00:14.910 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  create QNN Logger with log_level 2
2024-08-29 22:00:14.910 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Initializing HtpProvider
2024-08-29 22:00:14.915 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Function not called, PrepareLib isn't loaded!
2024-08-29 22:00:14.915 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Initialize Qnn backend parameters for Qnn executorch backend type 2
2024-08-29 22:00:14.915 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Caching: Caching is in RESTORE MODE.
2024-08-29 22:00:14.916 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:14.963 13269-13546 com.exampl...hllamademo com.example.executorchllamademo      D  vendor/qcom/proprietary/adsprpc/src/apps_std_imp.c:380: apps_std_fopen_fd: done for /data/app/~~BY85UMEoIy3eaUlr-KOJjg==/com.example.executorchllamademo-SVOyIUPHsxAqrnxpvDNH2A==/lib/arm64/cdsp/./libQnnHtpV75Skel.so with fopen:36us, read:0us, rpc_alloc:0us, mmap:0us
2024-08-29 22:00:14.972 13269-13546 com.exampl...hllamademo com.example.executorchllamademo      D  vendor/qcom/proprietary/adsprpc/src/apps_std_imp.c:380: apps_std_fopen_fd: done for /data/app/~~BY85UMEoIy3eaUlr-KOJjg==/com.example.executorchllamademo-SVOyIUPHsxAqrnxpvDNH2A==/lib/arm64/./libQnnHtpV75Skel.so with fopen:36us, read:7677us, rpc_alloc:252us, mmap:507us
2024-08-29 22:00:14.972 13269-13546 com.exampl...hllamademo com.example.executorchllamademo      I  vendor/qcom/proprietary/adsprpc/src/apps_std_imp.c:1149: Successfully opened file /data/app/~~BY85UMEoIy3eaUlr-KOJjg==/com.example.executorchllamademo-SVOyIUPHsxAqrnxpvDNH2A==/lib/arm64/./libQnnHtpV75Skel.so
2024-08-29 22:00:15.049 13269-13526 com.exampl...hllamademo com.example.executorchllamademo      I  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1684: remote_handle64_open: opened handle 0xb400007779ef9c10 (remote 0x1eca610) for file:///libQnnHtpV75Skel.so?qnn_skel_handle_invoke&_modver=1.0&_dom=cdsp on domain 3 (spawn time 40855 us, load time 88697 us), num handles 1
2024-08-29 22:00:15.065 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:15.065 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Function not called, PrepareLib isn't loaded!
2024-08-29 22:00:15.069 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:15.070 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Function not called, PrepareLib isn't loaded!
2024-08-29 22:00:15.406 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> This option is only for offline prepare case.
2024-08-29 22:00:15.407 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Running level=3 optimization.
2024-08-29 22:00:16.755 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  create QNN Logger with log_level 2
2024-08-29 22:00:16.755 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Initialize Qnn backend parameters for Qnn executorch backend type 2
2024-08-29 22:00:16.756 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Caching: Caching is in RESTORE MODE.
2024-08-29 22:00:16.758 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:16.759 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Function not called, PrepareLib isn't loaded!
2024-08-29 22:00:17.124 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> This option is only for offline prepare case.
2024-08-29 22:00:17.124 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Running level=3 optimization.
2024-08-29 22:00:18.852 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  create QNN Logger with log_level 2
2024-08-29 22:00:18.852 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Initialize Qnn backend parameters for Qnn executorch backend type 2
2024-08-29 22:00:18.852 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Caching: Caching is in RESTORE MODE.
2024-08-29 22:00:18.854 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:18.855 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Function not called, PrepareLib isn't loaded!
2024-08-29 22:00:19.355 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> This option is only for offline prepare case.
2024-08-29 22:00:19.355 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Running level=3 optimization.
2024-08-29 22:00:20.396 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  create QNN Logger with log_level 2
2024-08-29 22:00:20.396 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Initialize Qnn backend parameters for Qnn executorch backend type 2
2024-08-29 22:00:20.396 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Caching: Caching is in RESTORE MODE.
2024-08-29 22:00:20.398 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:20.400 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> Function not called, PrepareLib isn't loaded!
2024-08-29 22:00:21.141 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> fastrpc memory map for fd: 156 with length: 1145044992 failed with error: 0x1
2024-08-29 22:00:21.141 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Failed to map weights buffer to device!
2024-08-29 22:00:21.141 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Could not allocate persistent weights buffer!
2024-08-29 22:00:21.141 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Failed to initialize graph memory
2024-08-29 22:00:21.141 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Failed to initialize graph with id 257 context 4 deviceId 0 coreId 0 pdId 0 with err 1002
2024-08-29 22:00:21.141 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Context create from binary failed for deviceId 0 coreId 0 pdId 0 err 1002
2024-08-29 22:00:21.147 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Transport session for deviceId 0 coreId 0 pdId 2 not found!
2024-08-29 22:00:21.147 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Transport session for deviceId 0 coreId 0 pdId 2 not found!
2024-08-29 22:00:21.147 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:21.199 13269-13526 com.exampl...hllamademo com.example.executorchllamademo      E  vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1679: Error 0xffffffff: remote_handle64_open failed for file:///libQnnHtpV75Skel.so?qnn_skel_handle_invoke&_modver=1.0&_dom=cdsp&_session=2 (errno Operation not permitted)
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> DspTransport.openSession qnn_open failed, 0xffffffff
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> IDspTransport: Unknown rpc status 0x000003ff
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> DspTransport failed,cannot open session, error 0xffffffff
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Error from rpc transport. transportStatus = -1
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Failed to retrieve skel build id: err: 1003
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Failed to create a new transport session for deviceId 0, coreId 0, pdId 2: err: 14002
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Error in creating transport session for deviceId 0, coreId 0, pdId 2, err: 14002
2024-08-29 22:00:21.199 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Fail to create context from binary with err 14002
2024-08-29 22:00:21.200 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:21.201 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E   <E> Failed to create context from binary with err 0x36b2
2024-08-29 22:00:21.201 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      E  Can't create context from binary. Error 14002.
2024-08-29 22:00:21.201 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn backend parameters
2024-08-29 22:00:21.201 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn context
2024-08-29 22:00:21.206 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:21.207 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn device
2024-08-29 22:00:21.207 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn backend
2024-08-29 22:00:21.207 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn backend parameters
2024-08-29 22:00:21.207 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn context
2024-08-29 22:00:21.213 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:21.213 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn device
2024-08-29 22:00:21.213 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn backend
2024-08-29 22:00:21.213 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn backend parameters
2024-08-29 22:00:21.213 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn context
2024-08-29 22:00:21.220 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      W   <W> sg_stubPtr is not null, skip loadRemoteSymbols
2024-08-29 22:00:21.221 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn device
2024-08-29 22:00:21.221 13269-13526 [Qnn ExecuTorch]        com.example.executorchllamademo      I  Destroy Qnn backend
2024-08-29 22:00:21.232 13269-13526 ETLogging               com.example.executorchllamademo      D  Load complete. Model path: /data/local/tmp/llama/et_exported_llama3.1_8b_qnn_a16w4_shard4.pte
                                                                                                    Tokenizer path: /data/local/tmp/llama/llama3_tiktokenizer.bin
                                                                                                    Temperature: 0.1
                                                                                                    Model loaded time: 0 ms

@shewu-quic
Copy link
Collaborator Author

Hi @cccclai, @shewu-quic, I am trying to load the llama 3.1 8B model sharded in 4 to the QNN example. But the loading failed with memory buffer issues. The same workflow works for llama 2 7B model without using any sharding. I wonder if it is related to this diff and there are things needed to update for the model loading on-device?

This is on OnePlus 12 with 16GB RAM, so memory size shouldn't be the issue.

Hi @WuhanMonkey, if you saw exact pdId 2

<E> Failed to create a new transport session for deviceId 0, coreId 0, pdId 2: err: 14002

Then it's related to the system. It's hard to do anything on the application side.
I heard that this problem can be resolved on OnePlus 12 by upgrading the OS. Not sure if the OS upgrade is by regions so some regions have not gotten the fix yet.

I can only suggest

  • Use adb root to workaround if possible
  • Feedback the bug to OnePlus company, i.e., multiple Hexagon PD cannot be created in a single process.
  • Try QNN 2.23, but this might not resolve the problem.... mentioning this just because we're using QNN 2.23.

@WuhanMonkey
Copy link
Contributor

Hi @cccclai, @shewu-quic, I am trying to load the llama 3.1 8B model sharded in 4 to the QNN example. But the loading failed with memory buffer issues. The same workflow works for llama 2 7B model without using any sharding. I wonder if it is related to this diff and there are things needed to update for the model loading on-device?
This is on OnePlus 12 with 16GB RAM, so memory size shouldn't be the issue.

Hi @WuhanMonkey, if you saw exact pdId 2

<E> Failed to create a new transport session for deviceId 0, coreId 0, pdId 2: err: 14002

Then it's related to the system. It's hard to do anything on the application side. I heard that this problem can be resolved on OnePlus 12 by upgrading the OS. Not sure if the OS upgrade is by regions so some regions have not gotten the fix yet.

I can only suggest

  • Use adb root to workaround if possible
  • Feedback the bug to OnePlus company, i.e., multiple Hexagon PD cannot be created in a single process.
  • Try QNN 2.23, but this might not resolve the problem.... mentioning this just because we're using QNN 2.23.

Thank you. Beside OnePlus phone, can it work on S24+ without any issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants