Skip to content

[Distributed]Integrate toml for configs, sink distributed launch & DCP work to distributed level #898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 13, 2024

Conversation

lessw2020
Copy link
Contributor

This PR:
1 - adds toml support for distributed inference.
toml files can be spec'ed and stored in /inference_configs/llama3_8b.toml
at distributed launch the relevant toml file is loaded and parsed, and then used for distributed config esp pp and tp dimensions.

2 - moving the distributed launch code into world_maker.py in /distributed folder and simply exposing single api's to builder.py.
The idea here is to keep all code local to the distributed folder, and just expose surface apis to builder. This should make it easier to develop, as all code specific to distributed is local instead of interspersed between torchchat generic code (builder etc) and /distributed code.

3 - ruff and isort - ran ruff and isort to clean up current /distributed files.

Copy link

pytorch-bot bot commented Jul 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/898

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit be7db92 with merge base ceb9a3a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 12, 2024
Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me, leave some nits.

@fduwjj
Copy link
Contributor

fduwjj commented Jul 12, 2024

Also please make sure all CI passed before merging, thanks!

@lessw2020 lessw2020 merged commit d40e432 into pytorch:main Jul 13, 2024
51 checks passed
@lessw2020 lessw2020 deleted the pp_start branch July 13, 2024 03:11
@lessw2020 lessw2020 changed the title [Distributed]{do_not_review_yet} Integrate toml for configs, sink distributed launch & DCP work to distributed level [Distributed]Integrate toml for configs, sink distributed launch & DCP work to distributed level Jul 13, 2024
malfet pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
malfet pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
malfet pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
malfet pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
malfet pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
vmpuri pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
malfet pushed a commit that referenced this pull request Jul 17, 2024
…tributed launch & DCP work to distributed level (#898)

* start inference.sh, toml configs

* first toml

* add config_manager

* basic toml load, prep for starting dist

* sink init and add toml parsing

* toml load working

* add distributed logger

* logging working

* ruff and isort

* remove inference.py

* better toml breakout, add tomli if python < 3.11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants