Skip to content

Commit 26f1df5

Browse files
authored
Fix the penultimate token sometimes being lost with SSE streaming (ggml-org#1031)
The token immediately before an eot token was lost when SSE streaming was enabled if that token was contained entirely within a stop sequence. As an example of when this could happen, consider this prompt: Type the phrase 'pleas' once. In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token 'as' is contained within this instruct mode stop sequence: <|eot_id|><|start_header_id|>assistant<|end_header_id|> due to the word 'assistant'. Since `string_contains_sequence_substring` returns True for 'as', this token is added to `tokenReserve` instead of being streamed immediately. If the '<|eot_id|>' token was generated next, the text in `tokenReserve` would be discarded.
1 parent 948646f commit 26f1df5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

koboldcpp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1447,7 +1447,7 @@ async def handle_sse_stream(self, genparams, api_format):
14471447
tokenReserve += tokenStr
14481448
await asyncio.sleep(async_sleep_short) #if a stop sequence could trigger soon, do not send output
14491449
else:
1450-
if tokenStr!="":
1450+
if tokenStr!="" or tokenReserve!="":
14511451
tokenStr = tokenReserve + tokenStr
14521452
tokenReserve = ""
14531453

0 commit comments

Comments
 (0)