We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 5673c20 commit e3db248Copy full SHA for e3db248
docs/runner_build.md renamed to docs/native-execution.md
parking_lot/unsupported/runner-tokenizer.md
@@ -0,0 +1,12 @@
1
+The SentencePiece tokenizer implementations for Python (developed by
2
+Google) and the C/C++ implementation (developed by Andrej Karpathy)
3
+use different input formats. The Python implementation reads a
4
+tokenizer specification in tokenizer.model format. The C/C++ tokenizer
5
+that reads the tokenizer instructions from a file in tokenizer.bin
6
+format. We include Andrej's SentencePiece converter which translates a
7
+SentencePiece tokenizer in tokenizer.model format to tokenizer.bin in
8
+the XXXutilsXXX subdirectory:
9
+
10
+```
11
+python3 XXXutilsXXX/tokenizer.py --tokenizer-model=${MODEL_DIR}/tokenizer.model
12
0 commit comments