-
Notifications
You must be signed in to change notification settings - Fork 608
Update Llama README.md for Stories110M tokenizer #5960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5960
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit fbee9be with merge base f005dd5 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@dvorjackz merged this pull request in 12cb9ca. |
@pytorchbot cherry-pick --onto release/0.4 -c docs |
Summary: The tokenizer from `wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"` is TikToken, so we do not need to generate a `tokenizer.bin` and instead can just use the `tokenizer.model` as is. Pull Request resolved: #5960 Reviewed By: tarun292 Differential Revision: D64014160 Pulled By: dvorjackz fbshipit-source-id: 16474a73ed77192f58a5bb9e07426ba58216351e (cherry picked from commit 12cb9ca)
Cherry picking #5960The cherry pick PR is at #5975 The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
Update Llama README.md for Stories110M tokenizer (#5960) Summary: The tokenizer from `wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"` is TikToken, so we do not need to generate a `tokenizer.bin` and instead can just use the `tokenizer.model` as is. Pull Request resolved: #5960 Reviewed By: tarun292 Differential Revision: D64014160 Pulled By: dvorjackz fbshipit-source-id: 16474a73ed77192f58a5bb9e07426ba58216351e (cherry picked from commit 12cb9ca) Co-authored-by: Jack Zhang <[email protected]>
Summary
The tokenizer from
wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"
is TikToken, so we do not need to generate atokenizer.bin
and instead can just use thetokenizer.model
as is.