You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
adding run time info to eval and cleaning up output (#422)
* adding run time info to eval and cleaning up output
Summary:
output now includes info on model run time distribution and a cleaned up
result output.
Test Plan:
python eval.py --checkpoint-path checkpoints/$MODEL_REPO/model.pth \
--dtype bfloat16 --device cuda \
Time to run eval: 53.31s.
Time in model.forward: 20.29s, over 186 model evaluations
forward run time stats - Median: 0.10s Min: 0.04s Max: 2.18s
For model checkpoints/meta-llama/Llama-2-7b-hf/model.pth
wikitext:
word_perplexity,none: 9.1649
byte_perplexity,none: 1.5133
bits_per_byte,none: 0.5977
alias: wikitext
Reviewers:
Subscribers:
Tasks:
Tags:
* Adding evaluation.md content
Summary: see added content
Test Plan: n/a
Reviewers:
Subscribers:
Tasks:
Tags:
* docs update
Summary: removing install instructions
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Add documentation about `torchchat eval` explaining the process and options.
5
+
Torchchat provides evaluation functionality for your language model on a variety of tasks using the [lm-evaluation-harness](https://github.com/facebookresearch/lm_eval) library.
The evaluation mode of `torchchat.py` script can be used to evaluate your language model on various tasks available in the `lm_eval` library such as "wikitext". You can specify the task(s) you want to evaluate using the `--tasks` option, and limit the evaluation using the `--limit` option. If no task is specified, it will default to evaluating on "wikitext".
0 commit comments