Skip to content

Fix executorch kv cache incompatibility with to_executorch lowering #7279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jan 10, 2025

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Dec 11, 2024

Summary

Fix the Llama 3.2 vision text decoder prefill issue by marking the kv cache as an initialized mutable buffer in a custom pass

Test plan

  • Add kv_cache export tests that test accuracy against torchtune eager and verify contents of the cache after prefill and token-by-token generation
  • Export and run full Llama 3.2 vision text decoder
> python -m examples.models.llama.export_llama --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json  --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" -d fp32 --verbose --max_seq_length 64 -k
> python -m examples.models.llama3_2_vision.runner.native --model llama3_2_vision --pte llama3_2_vision.pte  --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model --prompt "Who's the founder of Meta?" --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --max_len 64 -kv --temperature 0

Copy link

pytorch-bot bot commented Dec 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7279

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6fe376d with merge base 25d8f15 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2024
@jackzhxng jackzhxng marked this pull request as draft December 11, 2024 00:44
@jackzhxng jackzhxng force-pushed the jz/fix-prefill branch 2 times, most recently from d538d43 to ee2eb15 Compare December 16, 2024 21:08
@jackzhxng jackzhxng force-pushed the jz/fix-prefill branch 2 times, most recently from 5dcb8f7 to f723fe1 Compare December 18, 2024 05:40
@jackzhxng jackzhxng marked this pull request as ready for review December 18, 2024 06:11
@jackzhxng jackzhxng changed the title [DRAFT] Fix executorch kv cache incompatibility with to_executorch lowering Fix executorch kv cache incompatibility with to_executorch lowering Dec 18, 2024
@jackzhxng jackzhxng requested review from tarun292 and lucylq December 21, 2024 00:40
Comment on lines 1660 to 1664
if initialize_buffer:
assert is_mutable_buffer
spec.const = True
else:
spec.const = not (is_user_input or is_mutable_buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for this logic; tests that would have broken before this fix, and would have caught this kv cache incompatibility

@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng jackzhxng requested a review from dbort January 9, 2025 21:11
Copy link
Contributor

@dbort dbort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the emitter tests!

# Test that the mutable buffer is uninitialized and starts with default zeros.
torch.allclose(
method_regular.execute((example_inputs))[0],
torch.ones(10, dtype=torch.int64),
Copy link
Contributor

@dbort dbort Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be zeros, based on the comment? If not, please update the comment to clarify why this is ones. And if it should be zeros, did this test fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's because in the forward of the model we do self.cache_pos += 1, I'll specify this

@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng jackzhxng merged commit 9666ee8 into main Jan 10, 2025
45 checks passed
@jackzhxng jackzhxng deleted the jz/fix-prefill branch January 10, 2025 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants