Skip to content

Commit 3275e60

Browse files
committed
falcon : fix regex
1 parent 3a461db commit 3275e60

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

llama.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12212,14 +12212,13 @@ struct llm_tokenizer_bpe {
1221212212
"\\s?\\p{L}+",
1221312213
"\\s?\\p{P}+",
1221412214
"[一-龥ࠀ-一가-퟿]+",
12215-
"\\p{N}+",
12215+
"\\p{N}",
1221612216
});
1221712217
break;
1221812218
case LLAMA_VOCAB_PRE_TYPE_FALCON:
1221912219
word_collection = unicode_regex_split(text, {
1222012220
"[\\p{P}\\$\\+<=>\\^~\\|]+",
1222112221
"'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",
12222-
"\\p{N}+",
1222312222
"[0-9][0-9][0-9]",
1222412223
});
1222512224
break;

0 commit comments

Comments
 (0)