Implementation of PyTorch ut parsing script - QA helper function #1386

BLOrange-AMD · 2024-03-29T15:45:10Z

This PR is to implement helper scripts/functions for running PyTorch unit tests and extracting results using xml test-reports.
The scripts returns the test results in the form of a python multi-level dictionary with the following structure:

{'workflow_name_1': {
test_file_and_status(file_name='workflow_1_aggregate', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_1', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_1', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_1', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_1', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_1', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_1', status='XFAILED'): {},
test_file_and_status(file_name='test_file_name_2', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_2', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_2', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_2', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_2', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_2', status='XFAILED'): {},}
'workflow_name_2': {
test_file_and_status(file_name='workflow_2_aggregate', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_3', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_3', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_3', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_3', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_3', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_3', status='XFAILED'): {},
test_file_and_status(file_name='test_file_name_4', status='ERROR'): {},
test_file_and_status(file_name='test_file_name_4', status='FAILED'): {},
test_file_and_status(file_name='test_file_name_4', status='PASSED'): {},
test_file_and_status(file_name='test_file_name_4', status='SKIPPED'): {},
test_file_and_status(file_name='test_file_name_4', status='STATISTICS'): {},
test_file_and_status(file_name='test_file_name_4', status='XFAILED'): {},}
.
.
.
.
.
.
}

Confluence page for design: https://confluence.amd.com/display/~boli1003/PyTorch+test+script+for+QA+team

Tests:
entire_tests: test_entire_tests.log
python .automation_script/run_pytorch_unit_tests.py 2>&1 | tee test_entire_tests.log

priority_test: test_priority_tests.log
python .automation_script/run_pytorch_unit_tests.py --test_config default distributed --priority_test 2>&1 | tee test_priority_tests.log

The priority log also shows separate entries for the "nccl" and "gloo" backends for distributed.test_distributed_spawn.

selected_tests: test_selected_tests.log
python .automation_script/run_pytorch_unit_tests.py --default_list test_weak test_dlpack --inductor_list inductor/test_torchinductor 2>&1 | tee test_selected_tests.log

Run from API test

BLOrange-AMD · 2024-03-29T21:06:42Z

Link for the implementation design: https://confluence.amd.com/display/~boli1003/PyTorch+test+script+for+QA+team

jithunnair-amd · 2024-04-03T04:47:38Z

@pruthvistony @BLOrange-AMD Some naming-related change requests:

Let's name the directory to start with a dot (".") to distinguish it from the directories related to actual PyTorch functionality: from automation_script to .automation_scripts (with an "s" at the end)
Rename ut_results_parse_automation.py to run_pytorch_unit_tests.py

automation_script/parse_xml_results.py

automation_script/ut_results_parse_automation.py

jithunnair-amd · 2024-04-05T21:08:52Z

@BLOrange-AMD Please update the description of this PR (the "< TBD >" part) as well as add documentation to the script itself to describe:

the two ways of using this script: Implementation of PyTorch ut parsing script - QA helper function #1386 (comment)
the data structure of the results returned by the script, with examples

automation_script/ut_results_parse_automation.py

automation_script/parse_xml_results.py

jithunnair-amd · 2024-04-18T13:45:53Z

We'll address refactor-related changes in another PR. Let's merge this if the interface is finalized.

* Initial implementation of PyTorch ut parsing script * Extracted path variables * Use nested dict to save results * Fixes typo * Cleanup * Fixes several issues * Minor name change * Update run_pytorch_unit_tests.py * Added file banners * Supported running from API * Added more help info * Consistent naming * Format help text --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]>

* Initial implementation of PyTorch ut parsing script * Extracted path variables * Use nested dict to save results * Fixes typo * Cleanup * Fixes several issues * Minor name change * Update run_pytorch_unit_tests.py * Added file banners * Supported running from API * Added more help info * Consistent naming * Format help text --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Print consolidated log file for pytorch unit test automation scripts (#1433) * Print consolidated log file for pytorch uts * Update run_entire_tests subprocess call as well * lint * Add ERROR string [SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491) * Check that >1 GPUs are visible when running TEST_CONFIG=distributed * Add EXECUTION_TIME to file-level and aggregate statistics PyTorch unit test helper scripts enhancements (#1517) * Fail earlier for distributed-on-1-GPU scenario * print cmd in consolidated log with prettier formatting * python->python3 Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264 --------- Co-authored-by: blorange-amd <[email protected]> Several issues fix of QA helper script (#1564) Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071

…A helper functions ======================================================================================= Implementation of PyTorch ut parsing script - QA helper function (#1386) * Initial implementation of PyTorch ut parsing script * Extracted path variables * Use nested dict to save results * Fixes typo * Cleanup * Fixes several issues * Minor name change * Update run_pytorch_unit_tests.py * Added file banners * Supported running from API * Added more help info * Consistent naming * Format help text --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Print consolidated log file for pytorch unit test automation scripts (#1433) * Print consolidated log file for pytorch uts * Update run_entire_tests subprocess call as well * lint * Add ERROR string [SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491) * Check that >1 GPUs are visible when running TEST_CONFIG=distributed * Add EXECUTION_TIME to file-level and aggregate statistics PyTorch unit test helper scripts enhancements (#1517) * Fail earlier for distributed-on-1-GPU scenario * print cmd in consolidated log with prettier formatting * python->python3 Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264 --------- Co-authored-by: blorange-amd <[email protected]> Several issues fix of QA helper script (#1564) Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071 Removed args inside function (#1595) Fixes SWDEV-475071 (cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3) QA script - Added multi gpu check with priority_tests (#1604) Fixes SWDEV-487907. Verified throwing exception for distributed is working correctly on single gpu with command: python .automation_scripts/run_pytorch_unit_tests.py --priority_test (cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e)

…A helper functions ======================================================================================= Implementation of PyTorch ut parsing script - QA helper function (#1386) * Initial implementation of PyTorch ut parsing script * Extracted path variables * Use nested dict to save results * Fixes typo * Cleanup * Fixes several issues * Minor name change * Update run_pytorch_unit_tests.py * Added file banners * Supported running from API * Added more help info * Consistent naming * Format help text --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Print consolidated log file for pytorch unit test automation scripts (#1433) * Print consolidated log file for pytorch uts * Update run_entire_tests subprocess call as well * lint * Add ERROR string [SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491) * Check that >1 GPUs are visible when running TEST_CONFIG=distributed * Add EXECUTION_TIME to file-level and aggregate statistics PyTorch unit test helper scripts enhancements (#1517) * Fail earlier for distributed-on-1-GPU scenario * print cmd in consolidated log with prettier formatting * python->python3 Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264 --------- Co-authored-by: blorange-amd <[email protected]> Several issues fix of QA helper script (#1564) Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071 Removed args inside function (#1595) Fixes SWDEV-475071 (cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3) QA script - Added multi gpu check with priority_tests (#1604) Fixes SWDEV-487907. Verified throwing exception for distributed is working correctly on single gpu with command: python .automation_scripts/run_pytorch_unit_tests.py --priority_test (cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e) (cherry picked from commit 6d5c3dc)

Initial implementation of PyTorch ut parsing script

dae10ae

BLOrange-AMD added 2 commits April 1, 2024 23:08

Extracted path variables

223e08d

Use nested dict to save results

0e24a38

BLOrange-AMD requested review from jithunnair-amd and pruthvistony April 2, 2024 16:02

BLOrange-AMD added 2 commits April 2, 2024 16:09

Fixes typo

7faa7c4

Cleanup

54e53f6

jithunnair-amd requested changes Apr 3, 2024

View reviewed changes

jithunnair-amd requested changes Apr 5, 2024

View reviewed changes

automation_script/ut_results_parse_automation.py Outdated Show resolved Hide resolved

automation_script/ut_results_parse_automation.py Outdated Show resolved Hide resolved

BLOrange-AMD added 2 commits April 8, 2024 21:16

Fixes several issues

7accc38

Minor name change

4d04e67

BLOrange-AMD requested a review from jithunnair-amd April 9, 2024 16:39

pruthvistony requested changes Apr 10, 2024

View reviewed changes

automation_script/ut_results_parse_automation.py Outdated Show resolved Hide resolved

automation_script/parse_xml_results.py Outdated Show resolved Hide resolved

jithunnair-amd and others added 5 commits April 16, 2024 13:22

Update run_pytorch_unit_tests.py

3ad271b

Added file banners

ff5f406

Supported running from API

f4c2cfc

Added more help info

4d1cccb

Consistent naming

a26041e

BLOrange-AMD requested a review from pruthvistony April 17, 2024 15:50

Format help text

38bb46b

jithunnair-amd force-pushed the rocm6.2_ut_logs_summary_script branch from 7fd60b3 to 38bb46b Compare April 18, 2024 04:29

jithunnair-amd approved these changes Apr 18, 2024

View reviewed changes

pruthvistony approved these changes Apr 18, 2024

View reviewed changes

pruthvistony merged commit 1b2a3a0 into rocm6.2_internal_testing Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of PyTorch ut parsing script - QA helper function #1386

Implementation of PyTorch ut parsing script - QA helper function #1386

Uh oh!

BLOrange-AMD commented Mar 29, 2024 •

edited

Loading

Uh oh!

BLOrange-AMD commented Mar 29, 2024

Uh oh!

jithunnair-amd commented Apr 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd commented Apr 5, 2024

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd commented Apr 18, 2024

Uh oh!

Uh oh!

Implementation of PyTorch ut parsing script - QA helper function #1386

Implementation of PyTorch ut parsing script - QA helper function #1386

Uh oh!

Conversation

BLOrange-AMD commented Mar 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BLOrange-AMD commented Mar 29, 2024

Uh oh!

jithunnair-amd commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd commented Apr 5, 2024

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd commented Apr 18, 2024

Uh oh!

Uh oh!

BLOrange-AMD commented Mar 29, 2024 •

edited

Loading

jithunnair-amd commented Apr 3, 2024 •

edited

Loading