You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR does the following changes:
- add `--num_fewshot` option which is required for running MMLU task with 5 shots
- set the default value of `--limit` to none such that we can actually run all examples
- update `eval_llama` to call `simple_evaluate` which is a wrapper of `evaluate` and does some extra work for us like getting the task dict
Test Plan:
- Make sure UncycloText perplexity for llama 3.2 1B stays the same before and after the change.
Before, run eval_llama for llama 3.2 1B with limit set to None:
```
wikitext: {'word_perplexity,none': 12.78246428138387, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.610432252171856, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6874479705552373, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
After, run eval_llama for llama 3.2 1B:
```
wikitext: {'word_perplexity,none': 12.78246428138387, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.610432252171856, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6874479705552373, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
- Make sure that lm_eval(v0.4.2, which is used by eval_llama) and eval_llama reports similar number for llama 3.2 1B and 3B BF16 for MMLU task with 5 shots.
Example command for lm_eval:
```
lm_eval --model hf \
--model_args pretrained=meta-llama/Llama-3.2-1B-Instruct \
--tasks mmlu \
--device cuda \
-f 5 \
--batch_size auto
```
Example command for eval_llama:
```
python -m examples.models.llama2.eval_llama \
-c /home/lunwenh/models/1B_Instruct/consolidated.00.pth \
-p /home/lunwenh/models/1B_Instruct/params.json \
-t /home/lunwenh/models/1B_Instruct/tokenizer.model \
-kv \
-d bf16 \
--tasks mmlu \
-f 5 \
--max_seq_length 2048
```
Differential Revision: [D64215268](https://our.internmc.facebook.com/intern/diff/D64215268)
[ghstack-poisoned]
0 commit comments