Skip to content

[executorch] generation.py with kv cache #3030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Apr 12, 2024

No description provided.

Copy link

pytorch-bot bot commented Apr 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3030

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 9a4c1f6 with merge base 7616d42 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 12, 2024
@lucylq lucylq force-pushed the lfq.generation-kv-cache-2 branch from 00e9159 to 5c98ea6 Compare April 12, 2024 23:11
@facebook-github-bot
Copy link
Contributor

@lucylq has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@lucylq lucylq marked this pull request as ready for review April 12, 2024 23:12
@lucylq lucylq force-pushed the lfq.generation-kv-cache-2 branch from 5c98ea6 to b48bfd1 Compare April 12, 2024 23:24
@facebook-github-bot
Copy link
Contributor

@lucylq has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@lucylq lucylq changed the title kv cache 2 [executorch] generation.py with kv cache Apr 12, 2024
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint? Also does this work with our current model? Llama2 and stories

Summary:
python e2e generation, using tiktoken tokenizer.

using text_completion, haven't tried chat_completion.


Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Command, with prompt "Hello, I am" and seq_len = 10
```
python -m examples.models.llama2.runner.generation --pte llama_4ckpts_x.pte --tokenizer tokenizer.model --prompt="Hello I am"  --temperature=0 --params ../llama-models/llama3/params_less.json --max_gen_len=10
```

fp32, xnn, kv
fp32, xnn
same results:
```
Result: [{'generation': ' a 25 year old woman. I am a'}]
```

fp32, xnn, int4
```
Result: [{'generation': ' interested in the following products: - 1 x'}]
```

fp32, xnn, kv, sdpa (need investigation)
```
Result: [{'generation': 'ฉopteraenthalenthalenthalenthalenthalenthalenthalenthal'}]
```

Reviewed By: larryliu0820

Differential Revision: D56087430

Pulled By: lucylq
@facebook-github-bot facebook-github-bot force-pushed the lfq.generation-kv-cache-2 branch from b48bfd1 to 9a4c1f6 Compare April 15, 2024 16:59
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56087430

@facebook-github-bot
Copy link
Contributor

@lucylq merged this pull request in 645256d.

@mergennachin mergennachin mentioned this pull request Apr 26, 2024
@lucylq lucylq deleted the lfq.generation-kv-cache-2 branch January 24, 2025 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants