Skip to content

Commit 9ffb78d

Browse files
kaetemiarthw
authored andcommitted
server : update doc to clarify n_keep when there is bos token (ggml-org#8619)
1 parent 6220d93 commit 9ffb78d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/server/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ node index.js
444444

445445
`n_predict`: Set the maximum number of tokens to predict when generating text. **Note:** May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. Default: `-1`, where `-1` is infinity.
446446

447-
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded.
447+
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
448448
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
449449

450450
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.

0 commit comments

Comments
 (0)