You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729)
Silently insert U+FFFD(s) (Unicode replacement character) instead until the
next valid codepoint can be found.
This fixes `llama_tokenize` throwing an exception across the C API boundary
or libllama's module boundary (the caller's runtime might be incompatible!)
Returing a proper error code might be desirable, however the signature
of `llama_tokenize` doesn't allow it as all return values already have
existing meaning.
0 commit comments