-
Notifications
You must be signed in to change notification settings - Fork 250
ci: Add llama3 gpu workflow in perioidic #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No ciflow labels are configured for this repo. |
2545c9f
to
8cff173
Compare
43b91cf
to
610ade6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so long as this works you should go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This PR is currently blocked on the I did test this locally and ran into an issue with INT4 group-wise quantization: logs
I'm also skeptical if this will work on an A10G since it is somewhat memory constrained compared to the H100 I tested this on. |
Adds a llama3 testing workflow for periodic, downloads this using huggingface-cli. This is somewhat of a working prototype, I left a couple of TODOS in places where things could be done better if given more time. Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]> Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
Didn't realiize the checkpoint normalization was already taken care of later on down the line Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
96670c0
to
1936561
Compare
Yeah appears as though my fears of memory limits are founded: https://github.com/pytorch/torchchat/actions/runs/8808406315/job/24177435186?pr=399#step:11:809
Going to merge though so the team can continue to iterate here |
Adds a llama3 testing workflow for periodic, downloads this using huggingface-cli.
This is somewhat of a working prototype, I left a couple of TODOS in places where things could be done better if given more time.
Another note: This also only works for GPU since this needed to get done fast and I only edited the GPU workflow