Fix executorch kv cache incompatibility with to_executorch lowering #7279

jackzhxng · 2024-12-11T00:44:35Z

Summary

Fix the Llama 3.2 vision text decoder prefill issue by marking the kv cache as an initialized mutable buffer in a custom pass

Test plan

Add kv_cache export tests that test accuracy against torchtune eager and verify contents of the cache after prefill and token-by-token generation
Export and run full Llama 3.2 vision text decoder

> python -m examples.models.llama.export_llama --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json  --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" -d fp32 --verbose --max_seq_length 64 -k
> python -m examples.models.llama3_2_vision.runner.native --model llama3_2_vision --pte llama3_2_vision.pte  --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model --prompt "Who's the founder of Meta?" --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --max_len 64 -kv --temperature 0

pytorch-bot · 2024-12-11T00:44:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7279

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6fe376d with merge base 25d8f15 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

extension/llm/export/builder.py

extension/llm/modules/test/test_kv_cache.py

exir/passes/cache_pos_init_mutable_pass.py

exir/passes/init_mutable_pass.py

dbort · 2024-12-26T20:24:14Z

exir/emit/_emitter.py

+            if initialize_buffer:
+                assert is_mutable_buffer
+                spec.const = True
+            else:
+                spec.const = not (is_user_input or is_mutable_buffer)


Please add unit tests for this logic; tests that would have broken before this fix, and would have caught this kv cache incompatibility

exir/passes/init_mutable_pass.py

facebook-github-bot · 2025-01-02T19:45:43Z

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

dbort

Thanks for adding the emitter tests!

exir/emit/test/test_emit.py

dbort · 2025-01-09T22:25:13Z

exir/emit/test/test_emit.py

+        # Test that the mutable buffer is uninitialized and starts with default zeros.
+        torch.allclose(
+            method_regular.execute((example_inputs))[0],
+            torch.ones(10, dtype=torch.int64),


Should this be zeros, based on the comment? If not, please update the comment to clarify why this is ones. And if it should be zeros, did this test fail?

Oh, it's because in the forward of the model we do self.cache_pos += 1, I'll specify this

facebook-github-bot · 2025-01-09T22:49:27Z

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-01-10T19:18:44Z

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

exir/passes/init_mutable_pass.py

exir/emit/_emitter.py

facebook-github-bot · 2025-01-10T19:34:32Z

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-01-10T20:33:57Z

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…7279)

Add tests that localize the prefill issue to the kv cache

aac90a0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2024

jackzhxng marked this pull request as draft December 11, 2024 00:44

jackzhxng added the topic: not user facing label Dec 11, 2024

Fixes test but not model

917fb0d

jackzhxng force-pushed the jz/fix-prefill branch 2 times, most recently from d538d43 to ee2eb15 Compare December 16, 2024 21:08

Updated pass

46ea733

jackzhxng force-pushed the jz/fix-prefill branch from ee2eb15 to 46ea733 Compare December 16, 2024 21:08

jackzhxng added 2 commits December 17, 2024 18:38

Fix segmentation fault

5db136c

Lint

9cdfb43

jackzhxng force-pushed the jz/fix-prefill branch 2 times, most recently from 5dcb8f7 to f723fe1 Compare December 18, 2024 05:40

Only add pass when vision model

9e68531

jackzhxng force-pushed the jz/fix-prefill branch from f723fe1 to 9e68531 Compare December 18, 2024 05:41

Add comments

925409d

jackzhxng marked this pull request as ready for review December 18, 2024 06:11

jackzhxng requested review from tarun292, JacobSzwejbka and lucylq December 18, 2024 06:11

jackzhxng added 2 commits December 17, 2024 22:12

Remove import

2a3fe8b

Add pass

61101c2

jackzhxng changed the title ~~[DRAFT] Fix executorch kv cache incompatibility with to_executorch lowering~~ Fix executorch kv cache incompatibility with to_executorch lowering Dec 18, 2024

lucylq reviewed Dec 18, 2024

View reviewed changes

extension/llm/export/builder.py Outdated Show resolved Hide resolved

extension/llm/modules/test/test_kv_cache.py Show resolved Hide resolved

tarun292 reviewed Dec 19, 2024

View reviewed changes

exir/passes/cache_pos_init_mutable_pass.py Outdated Show resolved Hide resolved

PR review

4ee95d3

jackzhxng requested review from tarun292 and lucylq December 21, 2024 00:40

Fix test

e297c9b

tarun292 reviewed Dec 21, 2024

View reviewed changes

exir/passes/init_mutable_pass.py Show resolved Hide resolved

iseeyuan approved these changes Dec 21, 2024

View reviewed changes

jackzhxng force-pushed the jz/fix-prefill branch from 693bbbc to 0597d3a Compare December 23, 2024 19:11

Last changes

8145cda

jackzhxng force-pushed the jz/fix-prefill branch from 0597d3a to 8145cda Compare December 23, 2024 20:26

dbort requested changes Dec 26, 2024

View reviewed changes

Merge branch 'main' into jz/fix-prefill

73591f1

Update attention test

a2b7ee3

jackzhxng requested a review from dbort January 9, 2025 21:11

Tests

93f99ad

jackzhxng force-pushed the jz/fix-prefill branch from 228fe5c to 93f99ad Compare January 9, 2025 21:13

dbort approved these changes Jan 9, 2025

View reviewed changes

Dave pr comment

69e36fb

jackzhxng added 2 commits January 9, 2025 15:09

Merge branch 'main' into jz/fix-prefill

5c53856

Merge branch 'main' into jz/fix-prefill

9d84a42

JacobSzwejbka requested changes Jan 10, 2025

View reviewed changes

exir/passes/init_mutable_pass.py Outdated Show resolved Hide resolved

exir/emit/_emitter.py Outdated Show resolved Hide resolved

jackzhxng force-pushed the jz/fix-prefill branch from 4035d22 to 1bdc50b Compare January 10, 2025 19:51

Jacob pr review

6fe376d

jackzhxng force-pushed the jz/fix-prefill branch from 1bdc50b to 6fe376d Compare January 10, 2025 19:52

JacobSzwejbka approved these changes Jan 10, 2025

View reviewed changes

jackzhxng merged commit 9666ee8 into main Jan 10, 2025
45 checks passed

jackzhxng deleted the jz/fix-prefill branch January 10, 2025 21:11

YIWENX14 pushed a commit that referenced this pull request Jan 28, 2025

Fix executorch kv cache incompatibility with to_executorch lowering (#…

28cc5c4

…7279)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Uh oh!

jackzhxng commented Dec 11, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbort Dec 26, 2024

Uh oh!

Uh oh!

facebook-github-bot commented Jan 2, 2025

Uh oh!

dbort left a comment

Uh oh!

Uh oh!

dbort Jan 9, 2025 •

edited

Loading

Uh oh!

jackzhxng Jan 9, 2025

Uh oh!

facebook-github-bot commented Jan 9, 2025

Uh oh!

facebook-github-bot commented Jan 10, 2025

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jan 10, 2025

Uh oh!

facebook-github-bot commented Jan 10, 2025

Uh oh!

Uh oh!

Uh oh!

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Fix executorch kv cache incompatibility with to_executorch lowering #7279

Uh oh!

Conversation

jackzhxng commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7279

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbort Dec 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Jan 2, 2025

Uh oh!

dbort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dbort Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackzhxng Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 9, 2025

Uh oh!

facebook-github-bot commented Jan 10, 2025

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jan 10, 2025

Uh oh!

facebook-github-bot commented Jan 10, 2025

Uh oh!

Uh oh!

Uh oh!

jackzhxng commented Dec 11, 2024 •

edited

Loading

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

dbort Jan 9, 2025 •

edited

Loading