Skip to content

Commit 8f481d9

Browse files
Remove Unnecessary handling of special characters
1 parent 92e41ec commit 8f481d9

File tree

1 file changed

+0
-9
lines changed

1 file changed

+0
-9
lines changed

convert_hf_to_gguf.py

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -525,15 +525,6 @@ def get_vocab_base(self) -> tuple[list[str], list[int], str]:
525525
else:
526526
token: str = reverse_vocab[i]
527527
if token in added_vocab:
528-
# We need to manually encode and decode the added tokens in case special characters
529-
# used for `\n` / `\t` have been manually added in the added tokens
530-
# To avoid unexpected issues - we make sure to encode single-char tokens
531-
if len(token) == 1:
532-
previous_token = token
533-
token = tokenizer.decode(tokenizer.encode(token, add_special_tokens=False))
534-
if previous_token != token:
535-
logger.info(f"{repr(previous_token)} is encoded and decoded back to {repr(token)} using AutoTokenizer")
536-
537528
if tokenizer.added_tokens_decoder[i].special or self.does_token_look_special(token):
538529
toktypes.append(gguf.TokenType.CONTROL)
539530
else:

0 commit comments

Comments
 (0)