etLLM: add options to apply embedding or output. #8653
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add options to apply embedding or output in llama_transformer.
Currently the transformer's forward is applying embedding based on inputs (
tokens is not None and h is None
). However, if embedding is not applied, the embedding matrix should not be initialized to take extra memory. So add this option to control it.It's useful for models without embeddings, or the embeddings need to be done outside of transformer, or the embedding need to be done differently than the one inside transformer.
Similar option is added to the output layer.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.