Runner changes for TorchTune Llama3.2 vision text decoder #6610

jackzhxng · 2024-11-01T15:07:59Z

Summary

Changes to eager (Python) and native (ET) runners to run TorchTune's llama3_2_vision text decoder without KV cache (KV cache in progress). Should extend to the regular TorchTune llama3_2 model as well, will add support in following PRs.

Native runner relies on #6670 to get in.

PR chain:

Test plan

Download the model from torchtune: tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct.

Run eager:

python -m examples.models.llama.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 32

Run executorch on portable lib (doesn't work until #6670 gets in, can test for now by adding pytorch/pytorch#137662 following to PyTorch installation):

# Export model to executorch.
python -m examples.models.llama.export_llama --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json -d fp32 --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte"

# Run using native runner.
python -m examples.models.llama.runner.native --model llama3_2_vision --pte llama3_2_vision.pte  --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model --prompt "How many calories are in bread?" --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --max_len 64

Summary: Specify model to export in the CLI. Test Plan: Exported the stories 110M model. ``` python -m examples.models.llama.export_llama -c stories110M/stories110M.pt -p stories110M/params.json -X -kv ``` PR chain: - [Add kwarg example inputs to eager model base](#5765) - [Llama2 model cleanup](#5859) - **YOU ARE HERE ~>** [Accept model type parameter in export_llama](#5910) - [Export TorchTune llama3_2_vision in ET](#5911) - [Runner changes for TorchTune Llama3.2 vision text decoder](#6610) - [Add et version of TorchTune MHA for swapping with custom op](#5912) Differential Revision: D65612837 Pulled By: dvorjackz

Summary: Specify model to export in the CLI. Test Plan: Exported the stories 110M model. ``` python -m examples.models.llama.export_llama -c stories110M/stories110M.pt -p stories110M/params.json -X -kv ``` PR chain: - [Add kwarg example inputs to eager model base](#5765) - [Llama2 model cleanup](#5859) - **YOU ARE HERE ~>** [Accept model type parameter in export_llama](#5910) - [Export TorchTune llama3_2_vision in ET](#5911) - [Runner changes for TorchTune Llama3.2 vision text decoder](#6610) - [Add et version of TorchTune MHA for swapping with custom op](#5912) Reviewed By: helunwencser Differential Revision: D65612837 Pulled By: dvorjackz

extension/llm/export/builder.py

tarun292 · 2024-11-13T23:19:25Z

examples/models/llama/runner/native.py

@@ -89,7 +101,6 @@ def build_args_parser() -> argparse.ArgumentParser:
    parser.add_argument(
        "-kv",
        "--kv_cache",


I think we'd want the default to still be True?

Oh yeah this was weird since "store_true" works by having a default of False, but the default is here set to True so it's just always True regardless of what you put

examples/models/llama/runner/generation.py

jackzhxng added 30 commits October 9, 2024 15:33

Changes to native runner to run tt

7f81e00

Add kwarg example inputs to eager model base

0b5a9a7

Create create new method for example kwarg inputs instead

a9647d2

Add kwarg example inputs to eager model base

fa3b1d2

Lint

e8715ba

Accept model type parameter in export_llama

a6f96a2

Remove future implementation

328c72c

Lint

ec80bba

Create create new method for example kwarg inputs instead

c9bbe12

Accept model type parameter in export_llama

99d5bfb

Torchtune llama3_2_vision model in ET, no quantization

1fb2236

Fix vision model example input

e0c4b8a

Lint

e145bd1

Kv cache

ed906cb

Merge branch 'main' into jz/tt-llama

6dd47e7

Update READMEs

1825972

Change model default arg

196499a

Update eager runner and eval llama

96ba40b

Merge branch 'jz/tt-llama-rebased' into jz/tt-llama-2

18a82e1

Fix tests

0f3035d

Merge branch 'jz/tt-llama-rebased' into jz/tt-llama-2

e677e14

Fix tests again

b1f6678

Merge branch 'jz/tt-llama-rebased' into jz/tt-llama-2

13d004b

Strict = True

c79b773

Things work

b8ff8e2

Merge branch 'jz/tt-llama-rebased' into jz/native-runner-tt

25ec7ce

Clip logits if torchtune

6e38763

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

7a7041d

Fix

96d5798

Kv cache by default is false

f275e2e

This was referenced Nov 1, 2024

Accept model type parameter in export_llama #6507

Merged

Add et version of TorchTune MHA for swapping with custom op #5912

Closed

jackzhxng added 2 commits November 13, 2024 06:56

Strict = True

de45c48

Merge branch 'main' into jz/tt-llama-2

2fe7bd8

jackzhxng force-pushed the jz/tt-llama-2 branch from c79b773 to 2fe7bd8 Compare November 13, 2024 15:06

jackzhxng added 3 commits November 13, 2024 07:08

Lint

64dcbda

Fix merge

a89d6b2

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

e1ec74c

jackzhxng force-pushed the jz/native-runner-tt branch from 2d3793b to e1ec74c Compare November 13, 2024 15:37

Fixes

84422d9

jackzhxng force-pushed the jz/native-runner-tt branch from 7c6f214 to 84422d9 Compare November 13, 2024 15:43

Remove token count printing

1163769

tarun292 reviewed Nov 13, 2024

View reviewed changes

jackzhxng added 5 commits November 14, 2024 09:45

Move to subdir

e5428de

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

eefadaa

Merge remote-tracking branch 'origin/main' into jz/tt-llama-2

bf33485

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

9c5647c

Tarun rev

f61a347

jackzhxng requested a review from tarun292 November 14, 2024 18:38

tarun292 approved these changes Nov 14, 2024

View reviewed changes

jackzhxng added 3 commits November 14, 2024 12:49

Add automatically generated export tests

7a0101f

Fix internal pyre warning

9777e23

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

2b9f281

Base automatically changed from jz/tt-llama-2 to main November 14, 2024 22:01

jackzhxng merged commit 6c944db into main Nov 14, 2024
38 of 39 checks passed

jackzhxng deleted the jz/native-runner-tt branch November 14, 2024 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Runner changes for TorchTune Llama3.2 vision text decoder #6610

Runner changes for TorchTune Llama3.2 vision text decoder #6610

Uh oh!

jackzhxng commented Nov 1, 2024 •

edited

Loading

Uh oh!

Uh oh!

tarun292 Nov 13, 2024

Uh oh!

jackzhxng Nov 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Runner changes for TorchTune Llama3.2 vision text decoder #6610

Runner changes for TorchTune Llama3.2 vision text decoder #6610

Uh oh!

Conversation

jackzhxng commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

tarun292 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

jackzhxng Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jackzhxng commented Nov 1, 2024 •

edited

Loading