Add proper pt2e calibration #5095

cccclai · 2024-09-04T22:16:02Z

Currently pt2e calibration is a dummy calibration, use a proper calibration for eval

Command line for evaluate:

python -m examples.models.llama2.eval_llama  -t /data/users/chenlai/models/stories/tokenizer.model -p /data/users/chenlai/models/stories/params.json -c /data/users/chenlai/models/stories/stories110M.pt --pt2e_quantize qnn_16a4w --qnn -kv --disable_dynamic_shape  --max_seq_len 16  --limit 1 --calibration_tasks "wikitext" --calibration_limit 1 --calibration_seq_length 16 -t /data/users/chenlai/models/stories/tokenizer.model

Command line for export:

python -m examples.models.llama2.export_llama  -t /data/users/chenlai/models/stories/tokenizer.model -p /data/users/chenlai/models/stories/params.json -c /data/users/chenlai/models/stories/stories110M.pt --pt2e_quantize qnn_16a4w --qnn -kv --disable_dynamic_shape  --max_seq_len 16  --limit 1 --calibration_tasks "wikitext" --calibration_limit 1 --calibration_seq_length 16 -t /data/users/chenlai/models/stories/tokenizer.model

pytorch-bot · 2024-09-04T22:16:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5095

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d4d7cfa with merge base 9739609 ():

NEW FAILURE - The following job has failed:

pull / unittest-arm (buck2) / linux-job (gh)
RuntimeError: Command docker exec -t 3ede96ce2484fae8b4a1c1d1e1a93eb9ebb91dc47b0defd06cf729834258c8f6 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2024-09-04T23:57:07Z

I think we might need to calibrate some special tokens in input template for llama3 instruct.

cccclai · 2024-09-05T00:10:47Z

I think we might need to calibrate some special tokens in input template for llama3 instruct.

I see. Should I leave it to you for the proper calibration pr or would you prefer me to amend this one?

shewu-quic · 2024-09-05T00:15:47Z

I see. Should I leave it to you for the proper calibration pr or would you prefer me to amend this one?

If you can, I’d appreciate it!

cccclai · 2024-09-05T00:36:02Z

calibrate some special tokens in input template for llama3 instruct.

Sure happy to help. Mind sharing what extra calibration you did?

shewu-quic · 2024-09-05T00:41:51Z

calibrate some special tokens in input template for llama3 instruct.

Sure happy to help. Mind sharing what extra calibration you did?

Sure, I'm just calibrating a prompt that contains an input template.

def eval_once(self, module: torch.fx.GraphModule, string: str = "Once upon a time", max_len: int = 128):
        tokenizer = SimpleTokenizer(self.tokenizer_path)
        # TODO: change criteria & support batch inputs if necessary
        pos = torch.tensor(0, dtype=torch.int64)
        token_list = [tokenizer.bos_id] + tokenizer.encode(string)

        with torch.no_grad():
            while token_list[-1] != tokenizer.eos_id and pos < max_len:
                logits = module(
                    torch.full((1, 1), token_list[pos]),
                    torch.tensor((pos, )),
                )
                pos += 1
                if pos >= len(token_list):
                    token_list.append(torch.argmax(logits[:], dim=-1).item())
...
    def pt2e_quantize(self, quantizers: Optional[List[Quantizer]]) -> "LLMEdgeManager":
        ...
        # Calibration
        self.eval_once(m, string="<|start_header_id|>system<|end_header_id|>\n\nYou are a cute and funny chatbot answering questions<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nTell me about Meta<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", max_len=128)

extension/llm/export/builder.py

helunwencser · 2024-09-05T23:30:18Z

examples/models/llama2/evaluate/eager_eval.py

-            # Batch process the whole sequence.
-            logits = self._model(inps[:, : self._max_seq_length], pos_tensor)
-            return logits
+            if not self._dynamic_shape:


When do we not enable dynamic shape?

I believe dynamic_shape is default to true now. We can probably ignore the False case here.

We actually disable dynamic shape to all other backends except xnnpack: https://github.com/pytorch/executorch/tree/main/examples/models/llama2#optional-smaller-models-delegated-to-other-backends

Particularly for QNN, we can only do static shape for now.

examples/models/llama2/evaluate/eager_eval.py

helunwencser · 2024-09-05T23:35:45Z

extension/llm/export/builder.py

@@ -190,7 +252,26 @@ def pt2e_quantize(self, quantizers: Optional[List[Quantizer]]) -> "LLMEdgeManage
                ), "Please run capture_pre_autograd_graph first"
                m = prepare_pt2e(self.pre_autograd_graph_module, composed_quantizer)
                # Calibrate
-                m(*self.example_inputs)
+                logging.info(f"Calibrating with tasks: {self.calibration_tasks}, limit: {self.calibration_limit}, seq_length: {self.calibration_seq_length}, tokenizer_path: {self.tokenizer_path}")


Is this log info duplicated with the log at line 263? Maybe remove this line?

extension/llm/export/builder.py

helunwencser · 2024-09-05T23:42:21Z

extension/llm/export/builder.py

+                token_list = [tokenizer.bos_id] + tokenizer.encode(string, bos=True, eos=False)
+
+                with torch.no_grad():
+                    while token_list[-1] != tokenizer.eos_id and pos < max_len:


For llama 3, eos_id is actually eot_id BTW, https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/73

Curious why not batch prefill here to make it faster?

I think if batch prefill here, it should also check dynamic shape.

Here we use the graph module instead of eager model for calibration, and the graph module is captured with static shape. The graph module is captured with fix shape and batch prefill requires dynamic shape.

extension/llm/export/builder.py

shewu-quic

Just minor fixed
Overall look good to me.

kimishpatel

So instead of modifying export_llama_lib the way I think you want to do this is via

Introduce QNNRunEvalWrapper like here https://github.com/pytorch/executorch/blob/main/examples/models/llama2/eval_llama_lib.py#L124
On LLMEdgeManager instance returned from _prepare_for_llama_export, call quantization apis manually. Similar to https://github.com/pytorch/executorch/blob/main/examples/models/llama2/eval_llama_lib.py#L145. Where you export the model and then instead of calling pt2e_quantize, call prepare_pt2e manually. Such a model then can be returned as nn Module.
Wrap the module in QNNRunEvalWrapper like other wrappers.

Why this way?
I think this way you are not polluting export_llama_lib with eval related concerns and eval related to code still remains within eval pipeline

examples/models/llama2/eval_llama_lib.py

examples/models/llama2/evaluate/eager_eval.py

kimishpatel · 2024-09-06T04:03:21Z

examples/models/llama2/export_llama_lib.py

@@ -166,19 +166,25 @@ def build_args_parser() -> argparse.ArgumentParser:
        nargs="+",
        type=str,
        default=None,
-        help="Tasks for GPTQ calibration",
+        help="Tasks for GPTQ calibration from lm_eval",


nit: For future reference, separate out unrelated fixes

cccclai · 2024-09-06T06:11:43Z

@kimishpatel thanks for the detailed review. Apply suggesion on the latest commit and plz take another look.

Introduce QNNRunEvalWrapper like here https://github.com/pytorch/executorch/blob/main/examples/models/llama2/eval_llama_lib.py#L124

I introduce a graphmodule run eval wrapper in the new commit instead, as I feel like it's not qnn specific. The actual calibration data might be different but that can be control by the args

On LLMEdgeManager instance returned from _prepare_for_llama_export, call quantization apis manually. Similar to https://github.com/pytorch/executorch/blob/main/examples/models/llama2/eval_llama_lib.py#L145. Where you export the model and then instead of calling pt2e_quantize, call prepare_pt2e manually. Such a model then can be returned as nn Module.

I'm not exactly sure sure what it means. Do you mean having a separate calibrate api in LLMEdgeManager? and then do .prepare_pt2e().calibrate().convert_pt2e()? Also It's a graph module but not nn module here.

Wrap the module in QNNRunEvalWrapper like other wrappers.

Did this for the GraphModuleEvalWrapper

kimishpatel · 2024-09-06T13:50:30Z

I introduce a graphmodule run eval wrapper in the new commit instead, as I feel like it's not qnn specific. The actual calibration data might be different but that can be control by the args

Sounds good

kimishpatel · 2024-09-06T13:51:18Z

Also It's a graph module but not nn module here.

graph module is nn module

kimishpatel · 2024-09-06T13:53:41Z

I'm not exactly sure sure what it means. Do you mean having a separate calibrate api in LLMEdgeManager?

No the opposite. I dont think it makes sense to have calibrate api on LLMEdgeManager. I meant more like

llm_manager ...
graph_module = prepare_pt2e(llm_manager.exported_program().graph_module)

kimishpatel · 2024-09-06T13:55:52Z

extension/llm/export/builder.py

@@ -167,6 +178,69 @@ def capture_pre_autograd_graph(self) -> "LLMEdgeManager":
            )
        return self

+    def pt2e_calibrate(


This method should not be part of builder at all. It is meant to produce a model not calibrate.

Hence my suggestion was to move the functionality of this method either inside GraphModuleEvalWrapper or soemthing else

kimishpatel

discussed and looks good

This reverts commit 7122d31.

Revert "Add proper pt2e calibration (#5095)" This reverts commit 7122d31.

Summary: See discussion in pytorch#5095 Reland because of internal failure Differential Revision: D62323396

Summary: Pull Request resolved: pytorch#5152 See discussion in pytorch#5095 Reland because of internal failure Differential Revision: D62323396

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 4, 2024

cccclai requested review from helunwencser, chiwwang and shewu-quic September 4, 2024 22:55

cccclai mentioned this pull request Sep 4, 2024

add proper calibration to pt2e flow #4452

Closed

cccclai commented Sep 5, 2024

View reviewed changes

extension/llm/export/builder.py Outdated Show resolved Hide resolved

cccclai requested a review from kimishpatel September 5, 2024 22:17

helunwencser reviewed Sep 5, 2024

View reviewed changes

shewu-quic reviewed Sep 6, 2024

View reviewed changes

extension/llm/export/builder.py Outdated Show resolved Hide resolved

shewu-quic approved these changes Sep 6, 2024

View reviewed changes

cccclai added 9 commits September 5, 2024 21:21

Add proper pt2e calibration

02a2d3c

distinguish dynamic shape

bbbd846

remove unnecessary code

c1595bd

remove unnecessary code

82c9518

add comments

638466c

Address comments and add template calibration

ba53b93

remove logging

e1cbfe6

address comments

85154aa

remove cuda

21d3974

cccclai force-pushed the pt2e_calibration branch from a187cf8 to 21d3974 Compare September 6, 2024 04:22

kimishpatel requested changes Sep 6, 2024

View reviewed changes

add graph module eval wrapper

d4d7cfa

kimishpatel reviewed Sep 6, 2024

View reviewed changes

kimishpatel approved these changes Sep 6, 2024

View reviewed changes

cccclai merged commit 7122d31 into main Sep 6, 2024
35 of 36 checks passed

cccclai deleted the pt2e_calibration branch September 6, 2024 17:10

cccclai restored the pt2e_calibration branch September 6, 2024 18:25

cccclai added a commit that referenced this pull request Sep 6, 2024

Revert "Add proper pt2e calibration (#5095)"

44f9526

This reverts commit 7122d31.

cccclai mentioned this pull request Sep 6, 2024

Revert "Add proper pt2e calibration" #5136

Merged

cccclai added a commit that referenced this pull request Sep 6, 2024

Revert "Add proper pt2e calibration" (#5136)

2763233

Revert "Add proper pt2e calibration (#5095)" This reverts commit 7122d31.

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Sep 6, 2024

Reland add proper calibration for pt2e flow

31a0cb4

Summary: See discussion in pytorch#5095 Reland because of internal failure Differential Revision: D62323396

cccclai mentioned this pull request Sep 6, 2024

Reland add proper calibration for pt2e flow #5152

Merged

Add proper pt2e calibration #5095

Add proper pt2e calibration #5095

Uh oh!

Conversation

cccclai commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5095

❌ 1 New Failure

Uh oh!

shewu-quic commented Sep 4, 2024

Uh oh!

cccclai commented Sep 5, 2024

Uh oh!

shewu-quic commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai commented Sep 5, 2024

Uh oh!

shewu-quic commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shewu-quic left a comment

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cccclai commented Sep 6, 2024

Uh oh!

kimishpatel commented Sep 6, 2024

Uh oh!

kimishpatel commented Sep 6, 2024

Uh oh!

kimishpatel commented Sep 6, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cccclai commented Sep 4, 2024 •

edited

Loading

pytorch-bot bot commented Sep 4, 2024 •

edited

Loading

shewu-quic commented Sep 5, 2024 •

edited

Loading

shewu-quic commented Sep 5, 2024 •

edited

Loading