-
Notifications
You must be signed in to change notification settings - Fork 251
Update torchtune pin to 0.4.0.dev20241010 #1300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1300
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6c97bd9 with merge base d1ab6e0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
please look at the list after running the install script for et as well and compare to ensure we don't have conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tl;dr - not a problem within this diff, but we'll need to look at ET's installation logic. After running through the ET installation, pip list shows the same version for torchtune. However, this reveals a deeper issue - ET is not checking for/using the ROCm (or CUDA) wheel and is force-installing the CPU versions of torch & other libs. If torchtune were included as a dep, then we'd run into the same issue. I don't know if this is an easy fix - ExecuTorch isn't targeted for GPU systems and should install the CPU version of torch. Warrants further discussion, but I think that's outside of the scope of this PR.
|
Co-authored-by: vmpuri <[email protected]>
* add pp_dim, distributed, num_gpus, num_nodes as cmd line args * add tp_dim * add elastic_launch * working, can now launch from cli * Remove numpy < 2.0 pin to align with pytorch (#1301) Fix #1296 Align with https://github.com/pytorch/pytorch/blame/main/requirements.txt#L5 * Update torchtune pin to 0.4.0-dev20241010 (#1300) Co-authored-by: vmpuri <[email protected]> * Unbreak gguf util CI job by fixing numpy version (#1307) Setting numpy version to be the range required by gguf: https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/pyproject.toml * Remove apparently-unused import torchvision in model.py (#1305) Co-authored-by: vmpuri <[email protected]> * remove global var for tokenizer type + patch tokenizer to allow list of sequences * make pp tp visible in interface * Add llama 3.1 to dist_run.py * [WIP] Move dist inf into its own generator * Add initial generator interface to dist inference * Added generate method and placeholder scheduler * use prompt parameter for dist generation * Enforce tp>=2 * Build tokenizer from TokenizerArgs * Disable torchchat format + constrain possible models for distributed * disable calling dist_run.py directly for now * Restore original dist_run.py for now * disable _maybe_parallelize_model again * Reenable arg.model_name in dist_run.py * Use singleton logger instead of print in generate * Address PR comments; try/expect in launch_dist_inference; added comments --------- Co-authored-by: lessw2020 <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: vmpuri <[email protected]> Co-authored-by: vmpuri <[email protected]> Co-authored-by: Scott Wolchok <[email protected]>
Update torchtune pin to 0.4.0-dev20241010 . This is the newest version visible to the ROCm wheel & fixes the installation script for AMD/ROCm systems (otherwise, it would break since 0.3.0-dev20240928 isn't visible via the ROCm 6.2 wheel)
pip list after running install_requirements.sh successfully.