Skip to content

Commit e3db248

Browse files
authored
update runner doc (#778)
1 parent 5673c20 commit e3db248

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed
File renamed without changes.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
The SentencePiece tokenizer implementations for Python (developed by
2+
Google) and the C/C++ implementation (developed by Andrej Karpathy)
3+
use different input formats. The Python implementation reads a
4+
tokenizer specification in tokenizer.model format. The C/C++ tokenizer
5+
that reads the tokenizer instructions from a file in tokenizer.bin
6+
format. We include Andrej's SentencePiece converter which translates a
7+
SentencePiece tokenizer in tokenizer.model format to tokenizer.bin in
8+
the XXXutilsXXX subdirectory:
9+
10+
```
11+
python3 XXXutilsXXX/tokenizer.py --tokenizer-model=${MODEL_DIR}/tokenizer.model
12+
```

0 commit comments

Comments
 (0)