Skip to content

Introduce GenerationConfig #10228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2025
Merged

Introduce GenerationConfig #10228

merged 1 commit into from
Apr 18, 2025

Conversation

larryliu0820
Copy link
Contributor

Summary:
Started to implement #9341
Started to fix #8495

This PR introduces GenerationConfig which contains the configs that can be changed across different invocations of generate().

For example, temperature is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call generate().

Similarly we put echo and warming into the config.

We also allow both seq_len and max_new_tokens to be passed through the config and we determine the value of max_new_tokens based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Differential Revision: D73091676

Copy link

pytorch-bot bot commented Apr 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10228

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ef7d4ca with merge base f911567 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676


if (warmup) {
runner.warmup(prompt, seq_len);
runner.warmup(prompt, /*max_new_tokens=*/seq_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be added in the internal runner as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which internal runner?

facebook-github-bot pushed a commit that referenced this pull request Apr 16, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

@larryliu0820 larryliu0820 added the release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.) label Apr 16, 2025
larryliu0820 added a commit that referenced this pull request Apr 16, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
facebook-github-bot pushed a commit that referenced this pull request Apr 16, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

larryliu0820 added a commit that referenced this pull request Apr 16, 2025
Summary:
Pull Request resolved: #10228

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`.

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
larryliu0820 added a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

facebook-github-bot pushed a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

facebook-github-bot pushed a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

larryliu0820 added a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

larryliu0820 added a commit that referenced this pull request Apr 17, 2025
Summary:
Pull Request resolved: #10228

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`.

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
larryliu0820 added a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

larryliu0820 added a commit that referenced this pull request Apr 17, 2025
Summary:
Pull Request resolved: #10228

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`.

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
facebook-github-bot pushed a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

facebook-github-bot pushed a commit that referenced this pull request Apr 17, 2025
Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary:

Started to implement #9341
Started to fix #8495

This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. 

For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`.

Similarly we put `echo` and `warming` into the config.

We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Reviewed By: iseeyuan

Differential Revision: D73091676
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73091676

@facebook-github-bot facebook-github-bot merged commit 08c07fa into main Apr 18, 2025
95 of 97 checks passed
@facebook-github-bot facebook-github-bot deleted the export-D73091676 branch April 18, 2025 01:55
keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025
Differential Revision: D73091676

Pull Request resolved: pytorch#10228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[etLLM] extension/llm should have unit tests
4 participants