Skip to content

Commit 0e7432e

Browse files
committed
Update on "update llama runner to decode single token"
Right now, we don't print the generated response in the eager runner until all tokens are generated. This is not good experience as we need to wait until all tokens are generated to see the response. This PR updates it to decode each new token immediately after it is generated. Differential Revision: [D65578306](https://our.internmc.facebook.com/intern/diff/D65578306/) [ghstack-poisoned]
2 parents 967eb29 + df7be71 commit 0e7432e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/models/llama/runner/generation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def text_completion(
125125
echo (bool, optional): Flag indicating whether to include prompt tokens in the generated output. Defaults to False.
126126
127127
Returns:
128-
CompletionPrediction: Completion prediction, which contains the generated text completion.
128+
Generated list of tokens.
129129
130130
Note:
131131
This method generates text completion for the provided prompt, employing nucleus sampling to introduce controlled randomness.

0 commit comments

Comments
 (0)