Skip to content

Commit 8f35bdd

Browse files
committed
Fix stop sequence performance bug.
1 parent 00ea3af commit 8f35bdd

File tree

2 files changed

+11
-5
lines changed

2 files changed

+11
-5
lines changed

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12-
- Added first version of the changelog
12+
- Added first version of the changelog
13+
14+
### Fixed
15+
16+
- Performance bug in stop sequence check slowing down streaming.

llama_cpp/llama.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -775,20 +775,22 @@ def _create_completion(
775775
break
776776

777777
if stream:
778+
remaining_tokens = completion_tokens[returned_tokens:]
779+
remaining_text = self.detokenize(remaining_tokens)
780+
remaining_length = len(remaining_text)
781+
778782
# We want to avoid yielding any characters from
779783
# the generated text if they are part of a stop
780784
# sequence.
781785
first_stop_position = 0
782786
for s in stop_sequences:
783-
for i in range(len(s), 0, -1):
784-
if all_text.endswith(s[:i]):
787+
for i in range(min(len(s), remaining_length), 0, -1):
788+
if remaining_text.endswith(s[:i]):
785789
if i > first_stop_position:
786790
first_stop_position = i
787791
break
788792

789793
token_end_position = 0
790-
remaining_tokens = completion_tokens[returned_tokens:]
791-
remaining_length = len(self.detokenize(remaining_tokens))
792794
for token in remaining_tokens:
793795
token_end_position += len(self.detokenize([token]))
794796
# Check if stop sequence is in the token

0 commit comments

Comments
 (0)