Skip to content

[SWDEV-466849] Enhancements for PyTorch UT helper scripts #1491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jithunnair-amd
Copy link
Collaborator

@jithunnair-amd jithunnair-amd commented Jul 22, 2024

  • Report execution time
  • Sanity check that distributed config is not run with only 1 GPU visible

@jithunnair-amd
Copy link
Collaborator Author

Verified offline by QA Automation team:

From Nishanth Chandrasekaran on Teams:
I am able to get the execution time as expected from the private branch and script is properly handling the error check for distributed tests on single GPU. I think we can good to merge https://github.com/ROCm/pytorch/tree/enhancements_for_SWDEV_466849 this private branch.
Execution Time:
image (1)

To handle Error check for distributed tests:
image (2)

@jithunnair-amd jithunnair-amd merged commit 679d168 into rocm6.3_internal_testing Jul 22, 2024
@jithunnair-amd jithunnair-amd deleted the enhancements_for_SWDEV_466849 branch July 22, 2024 15:35
pruthvistony pushed a commit that referenced this pull request Aug 12, 2024
* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics
pruthvistony pushed a commit that referenced this pull request Aug 12, 2024
* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics
dnikolaev-amd pushed a commit that referenced this pull request Sep 16, 2024
* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics
jithunnair-amd pushed a commit that referenced this pull request Oct 3, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071
jithunnair-amd pushed a commit that referenced this pull request Oct 11, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071
jithunnair-amd pushed a commit that referenced this pull request Oct 11, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
jithunnair-amd pushed a commit that referenced this pull request Oct 11, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
jithunnair-amd pushed a commit that referenced this pull request Nov 19, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
pruthvistony pushed a commit that referenced this pull request Dec 2, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
pruthvistony pushed a commit that referenced this pull request Dec 21, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
dnikolaev-amd pushed a commit that referenced this pull request Apr 17, 2025
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
(cherry picked from commit 6d5c3dc)
dnikolaev-amd pushed a commit that referenced this pull request Apr 24, 2025
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
(cherry picked from commit 6d5c3dc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant