Skip to content

Implementation of PyTorch ut parsing script - QA helper function #1386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 18, 2024

Conversation

BLOrange-AMD
Copy link

@BLOrange-AMD BLOrange-AMD commented Mar 29, 2024

This PR is to implement helper scripts/functions for running PyTorch unit tests and extracting results using xml test-reports.
The scripts returns the test results in the form of a python multi-level dictionary with the following structure:

{'workflow_name_1': {
test_file_and_status(file_name='workflow_1_aggregate', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_1', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_1', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_1', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_1', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_1', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_1', status='XFAILED'): {},
test_file_and_status(file_name='test_file_name_2', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_2', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_2', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_2', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_2', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_2', status='XFAILED'): {},}
'workflow_name_2': {
test_file_and_status(file_name='workflow_2_aggregate', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_3', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_3', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_3', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_3', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_3', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_3', status='XFAILED'): {},
test_file_and_status(file_name='test_file_name_4', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_4', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_4', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_4', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_4', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_4', status='XFAILED'): {},}
.
.
.
.
.
.
}

Confluence page for design: https://confluence.amd.com/display/~boli1003/PyTorch+test+script+for+QA+team

Tests:
entire_tests: test_entire_tests.log
python .automation_script/run_pytorch_unit_tests.py 2>&1 | tee test_entire_tests.log

priority_test: test_priority_tests.log
python .automation_script/run_pytorch_unit_tests.py --test_config default distributed --priority_test 2>&1 | tee test_priority_tests.log

The priority log also shows separate entries for the "nccl" and "gloo" backends for distributed.test_distributed_spawn.

selected_tests: test_selected_tests.log
python .automation_script/run_pytorch_unit_tests.py --default_list test_weak test_dlpack --inductor_list inductor/test_torchinductor 2>&1 | tee test_selected_tests.log

Run from API test

@BLOrange-AMD
Copy link
Author

Link for the implementation design: https://confluence.amd.com/display/~boli1003/PyTorch+test+script+for+QA+team

@jithunnair-amd
Copy link
Collaborator

jithunnair-amd commented Apr 3, 2024

@pruthvistony @BLOrange-AMD Some naming-related change requests:

  • Let's name the directory to start with a dot (".") to distinguish it from the directories related to actual PyTorch functionality: from automation_script to .automation_scripts (with an "s" at the end)
  • Rename ut_results_parse_automation.py to run_pytorch_unit_tests.py

@jithunnair-amd
Copy link
Collaborator

@BLOrange-AMD Please update the description of this PR (the "< TBD >" part) as well as add documentation to the script itself to describe:

  1. the two ways of using this script: Implementation of PyTorch ut parsing script - QA helper function #1386 (comment)
  2. the data structure of the results returned by the script, with examples

@jithunnair-amd jithunnair-amd force-pushed the rocm6.2_ut_logs_summary_script branch from 7fd60b3 to 38bb46b Compare April 18, 2024 04:29
@jithunnair-amd
Copy link
Collaborator

We'll address refactor-related changes in another PR. Let's merge this if the interface is finalized.

@pruthvistony pruthvistony merged commit 1b2a3a0 into rocm6.2_internal_testing Apr 18, 2024
pruthvistony pushed a commit that referenced this pull request Apr 23, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
dnikolaev-amd pushed a commit that referenced this pull request Jun 20, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
pruthvistony pushed a commit that referenced this pull request Aug 12, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
dnikolaev-amd pushed a commit that referenced this pull request Sep 16, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
jithunnair-amd pushed a commit that referenced this pull request Oct 3, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071
jithunnair-amd pushed a commit that referenced this pull request Oct 11, 2024
* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071
jithunnair-amd pushed a commit that referenced this pull request Oct 11, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
jithunnair-amd pushed a commit that referenced this pull request Oct 11, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
jithunnair-amd pushed a commit that referenced this pull request Nov 19, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
pruthvistony pushed a commit that referenced this pull request Dec 2, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
pruthvistony pushed a commit that referenced this pull request Dec 21, 2024
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
dnikolaev-amd pushed a commit that referenced this pull request Apr 17, 2025
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
(cherry picked from commit 6d5c3dc)
dnikolaev-amd pushed a commit that referenced this pull request Apr 24, 2025
…A helper functions

=======================================================================================

Implementation of PyTorch ut parsing script - QA helper function (#1386)

* Initial implementation of PyTorch ut parsing script

* Extracted path variables

* Use nested dict to save results

* Fixes typo

* Cleanup

* Fixes several issues

* Minor name change

* Update run_pytorch_unit_tests.py

* Added file banners

* Supported running from API

* Added more help info

* Consistent naming

* Format help text

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

Print consolidated log file for pytorch unit test automation scripts (#1433)

* Print consolidated log file for pytorch uts

* Update run_entire_tests subprocess call as well

* lint

* Add ERROR string

[SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491)

* Check that >1 GPUs are visible when running TEST_CONFIG=distributed

* Add EXECUTION_TIME to file-level and aggregate statistics

PyTorch unit test helper scripts enhancements (#1517)

* Fail earlier for distributed-on-1-GPU scenario
* print cmd in consolidated log with prettier formatting
* python->python3

Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264

---------

Co-authored-by: blorange-amd <[email protected]>

Several issues fix of QA helper script (#1564)

Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

Removed args inside function (#1595)

Fixes SWDEV-475071

(cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3)

QA script - Added multi gpu check with priority_tests (#1604)

Fixes SWDEV-487907. Verified throwing exception for distributed is
working correctly on single gpu with command: python
.automation_scripts/run_pytorch_unit_tests.py --priority_test

(cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)
(cherry picked from commit 6d5c3dc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants