Skip to content

Make TorchTune Llama model KV cache compatible in eager #6643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 60 commits into from
Nov 15, 2024

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Nov 4, 2024

Summary

  • Set up KV caches for TorchTune Llama model
  • Adds a separate runner for TorchTune Llama models, since the input handling methods are separate enough to warrant a new copy

Test plan

Download checkpoint: tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct

Eager without KV cache:

python -m examples.models.llama3_2_vision.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 64

Eager with KV cache:

python -m examples.models.llama3_2_vision.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 64 -kv

@jackzhxng jackzhxng changed the title Make TorchTune Llama model KV cache compatible Make TorchTune Llama model KV cache compatible in eager Nov 13, 2024
@jackzhxng jackzhxng marked this pull request as ready for review November 13, 2024 22:54
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test in a follow up

Base automatically changed from jz/native-runner-tt to main November 14, 2024 22:04
@jackzhxng jackzhxng merged commit 7b76f0f into main Nov 15, 2024
39 checks passed
@jackzhxng jackzhxng deleted the jz/tt-llama-kv-cache branch November 15, 2024 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants