Skip to content

Update Llama README.md for Stories110M tokenizer #5960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jackzhxng
Copy link
Contributor

Summary

The tokenizer from wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model" is TikToken, so we do not need to generate a tokenizer.bin and instead can just use the tokenizer.model as is.

Copy link

pytorch-bot bot commented Oct 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5960

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fbee9be with merge base f005dd5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024
@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@dvorjackz has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@dvorjackz merged this pull request in 12cb9ca.

@jackzhxng
Copy link
Contributor Author

@pytorchbot cherry-pick --onto release/0.4 -c docs

pytorchbot pushed a commit that referenced this pull request Oct 8, 2024
Summary:
The tokenizer from `wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"` is TikToken, so we do not need to generate a `tokenizer.bin` and instead can just use the `tokenizer.model` as is.

Pull Request resolved: #5960

Reviewed By: tarun292

Differential Revision: D64014160

Pulled By: dvorjackz

fbshipit-source-id: 16474a73ed77192f58a5bb9e07426ba58216351e
(cherry picked from commit 12cb9ca)
@pytorchbot
Copy link
Collaborator

Cherry picking #5960

The cherry pick PR is at #5975 The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

jackzhxng added a commit that referenced this pull request Oct 8, 2024
Update Llama README.md for Stories110M tokenizer (#5960)

Summary:
The tokenizer from `wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"` is TikToken, so we do not need to generate a `tokenizer.bin` and instead can just use the `tokenizer.model` as is.

Pull Request resolved: #5960

Reviewed By: tarun292

Differential Revision: D64014160

Pulled By: dvorjackz

fbshipit-source-id: 16474a73ed77192f58a5bb9e07426ba58216351e
(cherry picked from commit 12cb9ca)

Co-authored-by: Jack Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants