ci: Add llama3 gpu workflow in perioidic #399

seemethere · 2024-04-23T00:37:20Z

Adds a llama3 testing workflow for periodic, downloads this using huggingface-cli.

This is somewhat of a working prototype, I left a couple of TODOS in places where things could be done better if given more time.

Another note: This also only works for GPU since this needed to get done fast and I only edited the GPU workflow

pytorch-bot · 2024-04-23T00:39:34Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

orionr

I think so long as this works you should go for it.

mikekgfb

Thank you!

seemethere · 2024-04-23T22:56:19Z

This PR is currently blocked on the --tiktoken argument being made default in --generate.py.

I did test this locally and ran into an issue with INT4 group-wise quantization:

logs

+ python3 -W ignore export.py --dtype bfloat16 --quant '{"linear:int4-gptq" : {"groupsize": 32}}' --checkpoint-path checkpoints/meta-llama/Meta-Llama-3-8B/model.pth --output-dso-path checkpoints/meta-llama/Meta-Llama-3-8B/model.so --device cuda
Using device=cuda
Loading model ...
name Meta-Llama-3-8B
Time to load model: 4.40 seconds
Quantizing the model with: {"linear:int4-gptq" : {"groupsize": 32}}
device: cuda
2024-04-23:15:52:36,269 INFO     [huggingface.py:162] Using device 'cuda'
2024-04-23:15:52:41,107 WARNING  [task.py:763] [Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
2024-04-23:15:52:41,107 WARNING  [task.py:775] [Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
2024-04-23:15:52:41,107 WARNING  [task.py:763] [Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
2024-04-23:15:52:41,107 WARNING  [task.py:775] [Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
2024-04-23:15:52:41,107 WARNING  [task.py:763] [Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
2024-04-23:15:52:41,107 WARNING  [task.py:775] [Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Repo card metadata block was not found. Setting CardData to empty.
2024-04-23:15:52:42,341 WARNING  [repocard.py:107] Repo card metadata block was not found. Setting CardData to empty.
Obtaining GPTQ calibration inputs on:  ['wikitext']
2024-04-23:15:52:42,390 INFO     [task.py:395] Building contexts for wikitext on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 860.74it/s]
2024-04-23:15:52:42,403 INFO     [evaluator.py:362] Running loglikelihood_rolling requests
  0%|                                                                                                                                                                                                                                                           | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/users/eliuriegas/torchchat/export.py", line 116, in <module>
    main(args)
  File "/data/users/eliuriegas/torchchat/export.py", line 65, in main
    model = _initialize_model(
  File "/data/users/eliuriegas/torchchat/build/builder.py", line 373, in _initialize_model
    quantize_model(model, builder_args.device, quantize, tokenizer)
  File "/data/users/eliuriegas/torchchat/quantize.py", line 43, in quantize_model
    model = quantizer_class_dict[quantizer](
  File "/data/users/eliuriegas/torchchat/quantize.py", line 1191, in quantized_model
    model_updated_state_dict = self.create_quantized_state_dict(
  File "/home/eliuriegas/local/torchchat/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/users/eliuriegas/torchchat/quantize.py", line 1077, in create_quantized_state_dict
    inputs = GPTQQuantHandler.get_inputs(
  File "/data/users/eliuriegas/torchchat/quantize.py", line 1050, in get_inputs
    evaluate(
  File "/home/eliuriegas/local/torchchat/.venv/lib/python3.9/site-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/eliuriegas/local/torchchat/.venv/lib/python3.9/site-packages/lm_eval/evaluator.py", line 373, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/eliuriegas/local/torchchat/.venv/lib/python3.9/site-packages/lm_eval/models/huggingface.py", line 817, in loglikelihood_rolling
    token_list=self.tok_encode(string),
  File "/data/users/eliuriegas/torchchat/eval.py", line 133, in tok_encode
    encoded = encode_tokens(self._tokenizer, string, bos=True, device=self._device)
  File "/data/users/eliuriegas/torchchat/generate.py", line 357, in encode_tokens
    tokens = tokenizer.encode(string)
AttributeError: 'NoneType' object has no attribute 'encode'

I'm also skeptical if this will work on an A10G since it is somewhat memory constrained compared to the H100 I tested this on.

Adds a llama3 testing workflow for periodic, downloads this using huggingface-cli. This is somewhat of a working prototype, I left a couple of TODOS in places where things could be done better if given more time. Signed-off-by: Eli Uriegas <[email protected]>

Signed-off-by: Eli Uriegas <[email protected]> Signed-off-by: Eli Uriegas <[email protected]>

Signed-off-by: Eli Uriegas <[email protected]>

Didn't realiize the checkpoint normalization was already taken care of later on down the line Signed-off-by: Eli Uriegas <[email protected]>

Signed-off-by: Eli Uriegas <[email protected]>

seemethere · 2024-04-23T23:27:03Z

Yeah appears as though my fears of memory limits are founded: https://github.com/pytorch/torchchat/actions/runs/8808406315/job/24177435186?pr=399#step:11:809

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacity of 21.99 GiB of which 751.00 MiB is free. Process 7077 has 21.24 GiB memory in use. Of the allocated memory 20.83 GiB is allocated by PyTorch, and 130.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Going to merge though so the team can continue to iterate here

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 23, 2024

seemethere added ciflow/periodic and removed CLA Signed This label is managed by the Meta Open Source bot. ciflow/periodic labels Apr 23, 2024

seemethere force-pushed the seemethere/add_llama3_test branch from 2545c9f to 8cff173 Compare April 23, 2024 00:46

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 23, 2024

seemethere changed the title ~~ci: Add llama3 workflow in perioidic~~ ci: Add llama3 gpu workflow in perioidic Apr 23, 2024

seemethere force-pushed the seemethere/add_llama3_test branch from 43b91cf to 610ade6 Compare April 23, 2024 01:41

seemethere marked this pull request as ready for review April 23, 2024 18:57

seemethere requested review from guangy10, mikekgfb and malfet April 23, 2024 18:57

orionr approved these changes Apr 23, 2024

View reviewed changes

guangy10 approved these changes Apr 23, 2024

View reviewed changes

mikekgfb approved these changes Apr 23, 2024

View reviewed changes

seemethere and others added 12 commits April 23, 2024 16:09

limit test to gpu

4784c2e

Signed-off-by: Eli Uriegas <[email protected]> Signed-off-by: Eli Uriegas <[email protected]>

add HF_TOKEN

57194c1

Signed-off-by: Eli Uriegas <[email protected]>

use HF_PERIOD environment

97938c2

Signed-off-by: Eli Uriegas <[email protected]>

use direct secret instead

29ad0d8

Signed-off-by: Eli Uriegas <[email protected]>

Prefix with SECRET

031f312

Signed-off-by: Eli Uriegas <[email protected]>

fix normalization, stop relying on bashisms

f07794a

Signed-off-by: Eli Uriegas <[email protected]>

enable globstar, turn off progress bar

cde758d

Signed-off-by: Eli Uriegas <[email protected]>

force removal, add trailing slash to indicate directory

95fc807

Signed-off-by: Eli Uriegas <[email protected]>

fix final rm

a1bd09b

Signed-off-by: Eli Uriegas <[email protected]>

quite download, remove normalization

0c44c7a

Didn't realiize the checkpoint normalization was already taken care of later on down the line Signed-off-by: Eli Uriegas <[email protected]>

don't download to checkpoints directly

1936561

Signed-off-by: Eli Uriegas <[email protected]>

seemethere force-pushed the seemethere/add_llama3_test branch from 96670c0 to 1936561 Compare April 23, 2024 23:10

seemethere merged commit 525acfb into main Apr 23, 2024

seemethere deleted the seemethere/add_llama3_test branch April 23, 2024 23:27

malfet pushed a commit that referenced this pull request Jul 17, 2024

ci: Add llama3 gpu workflow in perioidic (#399)

bf90c69

malfet pushed a commit that referenced this pull request Jul 17, 2024

ci: Add llama3 gpu workflow in perioidic (#399)

33f8069

malfet pushed a commit that referenced this pull request Jul 17, 2024

ci: Add llama3 gpu workflow in perioidic (#399)

59d6e6e

malfet pushed a commit that referenced this pull request Jul 17, 2024

ci: Add llama3 gpu workflow in perioidic (#399)

c80d458

malfet pushed a commit that referenced this pull request Jul 17, 2024

ci: Add llama3 gpu workflow in perioidic (#399)

a60990f

malfet pushed a commit that referenced this pull request Jul 17, 2024

ci: Add llama3 gpu workflow in perioidic (#399)

f8224d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Add llama3 gpu workflow in perioidic #399

ci: Add llama3 gpu workflow in perioidic #399

Uh oh!

seemethere commented Apr 23, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 23, 2024

Uh oh!

orionr left a comment

Uh oh!

mikekgfb left a comment

Uh oh!

seemethere commented Apr 23, 2024

Uh oh!

seemethere commented Apr 23, 2024 •

edited

Loading

Uh oh!

Uh oh!

ci: Add llama3 gpu workflow in perioidic #399

ci: Add llama3 gpu workflow in perioidic #399

Uh oh!

Conversation

seemethere commented Apr 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 23, 2024

Uh oh!

orionr left a comment

Choose a reason for hiding this comment

Uh oh!

mikekgfb left a comment

Choose a reason for hiding this comment

Uh oh!

seemethere commented Apr 23, 2024

Uh oh!

seemethere commented Apr 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seemethere commented Apr 23, 2024 •

edited

Loading

seemethere commented Apr 23, 2024 •

edited

Loading